koyori (Bilingual KWIC)

Koyori is a visualization tool of word alignment. When koyori reads a parallel corpus whose sentences are aligned and receives a keyword, koyori decides its translation equivalent and show the texts that include the keyword and the equivalent in KWIC(keyword in context) format. We call this displaying method as "Bilingual KWIC."

Requirements

Install

You should install Sary and sary-ruby previously.

koyori-1.2.tar.gz

We use setup.rb as an installer of koyori. Download the archive file and de-compress it. Then enter its top directory and type:

($ su)
# ruby setup.rb

These simple step installs this program under the default location of Ruby libraries. You can also install files into your favorite directory by supplying setup.rb some options. Try

ruby setup.rb --help

If you run koyori, it makes following files in your home directory.

~/.koyorirc     : Configuration File
~/.koyoridic_je : Dictionary File from Japanese to English
~/.koyoridic_ej : Dictionary File from English to Japanese

Preparation

Before using koyori, you should prepare and compile a parallel corpus. Koyori can deal with various types of languages, but it deals with Japanese and English corpus on the initial condition. Please prepare two text files; one is written in Japanese and another is written in English. Each corresponding sentence of the two files should be written in the same line. If one English sentence is translated into two Japanese sentence, you describe the Japanese sentence in one line.

Type command make_koyori_index filename, and filename.kyr and filename.kyr.ary are created. The filename.kyr is a file adding line numbers to the filename and you load this file for koyori. The filename.kyr.ary is a file used by Sary.

Make_koyori_index has some options.

--euc                        consider the file written in EUC code.
-s, --sjis                   consider the file written in Shift-JIS.
-u, --utf                    consider the file written in UTF-8.
-w, --word                   consider words in the file are separated by space.
-c, --character              consider words in the file are not separated by space.
-e, --english                consider the file are written in Enlish. (same as -w)
-j, --japanese               consider the file are written in Japanese. (same as -c)
-t, --title                  consider titles are added to each line.
-h, --help                   print help message.

For compiling an English corpus, type make_koyori_index -w filename . For compiling an Japanese corpus written in Shift-JIS, type make_koyori_index -c -s filename .

The character code of filename.kyr is UTF8 regardless of the character code of the original file.

Aamples files are follows;

Original versions are following URL.

After download above files, run following command.

make_koyori_index -e the_rights_of_the_child.en.txt
make_koyori_index -j the_rights_of_the_child.ja.txt

And then, load these files by clickng [File] in the menu.

Usage

Load corpus file

Before start searching, click [File] and open corpus files, which is made by make_koyori_index and whose extensions are "kyr."

Search

Input a keyword or key phrase into the Keyword field and click the [Search] button or hit the Enter key, and koyori displays sentences from the source language in the left-hand window and translations of the sentences in the right-hand window on each line. At the same time the input keyword is centered in the source texts and its equivalent is also centered in the translations. That is, both texts are displayed in KWIC (KeyWord In Context) format. Koyori automatically calculates the equivalent of the input keyword. And the input keywords and calculated equivalents are displayed in blue.

If koyori calculated multiple equivalents, the sentences including most occurred one are displayed first, then reminders are displayed. If koyori cannot find an equivalent in a sentence, that sentence is displayed in gray. (Now, only the target sentence is gray.)

The calculated equivalent is also displayed in the Equivalent field. If you think the equivalent is wrong, you may input a new equivalent into the field and click the [Align] button or hit the Enter key. Then koyori redisplays the sentences that contain it and the input equivalent is displayed in red. If there are some target sentences that do not contain the input equivalent while their source sentences contain the input keyword, koyori calculates a new equivalent from the remainder and displays them in blue.

In addition, on pressing the cursor key in the Equivalent field, it is possible to find other equivalent candidates that koyori has calculated.

The numbers displayed in the right-hand side of the Keyword and Equivalent fields show the occurrence numbers of the keyword and the equivalent candidate in the corpus, respectively.

Registration into the Dictionary

From the version 1.1, koyori has a function of word registration. Right-click the sentence which includes the keyword or its equivalent that you want to register, and koyori pops up the Dictionary editing window. You can also register a phrase. Koyori allows users to register the following items into the bilingual dictionary.

Word
Input a word or phrase the user wants to register. When the Dictionary editing window is popped up, the Word field is filled with the input keyword. You can edit the word if you need. In addition, it is possible to extend or shorten the phrase in the fields by clicking the [<] or [>] buttons on both sides. If you edit the word or phrase in the field, you may push the [Reload] button in order to look up the dictionary again.
Part of Speech
Input a part-of-speech of the word.
Pronunciation
Input a pronunciation of the word or phrase. At the present, you can register neither a part-of-speech nor a pronunciation of the equivalent.
Equivalent
Input the equivalent. When the Dictionary editing window is popped up, this field is filled with the equivalent (which is colored) in the clicked sentence. As same as the Word field, you can edit the contains by using [<],[>] button.
Usage
Input usage of the word or phrase, especially in the case when the word has some equivalents and the users want to describe why one equivalent is correct while others are not.
Example
Input an example phrase or sentence where the word or phrase is used. It is on the set with the Translation field.
Translation
Input a translation of the example. By clicking the Edit button, koyori fills the Example and Translation fields with the input keyword and its equivalent, respectively, just as for the Word and Equivalent fields. It is also possible to extend or shorten the example or the translation by clicking the [<] or [>] buttons on both sides.
Source
Input a source of the example.
Comment
Input a comment if necessary.
Others
If the keyword in the Word field has other equivalents in the dictionary, they are displayed in this field. When click the [!] button, koyori displays the details such as an example of the equivalents.

You do not need to fill all the entry field. You have to fill the Word and Equivalent field at minimum.

Then, when you push the [Add] button, these informations are registered into the dictionary. To delete the equivalent, click the ``Delete'' or ``Delete All'' button. If several equivalents are registered for the word, the ``Delete'' button deletes only the equivalent entered in the Equivalent field. On the other hand, the ``Delete All'' button deletes the word and all its equivalents from the dictionary. To close the Dictionary editing window without any change, click the [Cancel] button.

Additional Function

Display the whole sentences

Click a specific sentence in the left- or right-hand window, and koyori displays the full text of the sentence and its translation in the bottom window.

Sort

Click the [<=Sort] or [Sort=>] button, and koyori sort the displayed sentences for a convenient comparison of examples or patterns of translation. Sorting can be performed using any of four sort keys: the words before, or after, the input keyword; or the words before, or after, the equivalent.

Using Word List

If you want to register many words, you do well to prepare a list of the words and make koyori load it. The word list should have one word in each line. If you make koyori load it from the menu [File]->[Load Word List], the top word of the list are displayed in the Keyword field. And if you push the [Next] button or pressing the cursor key, the next word on the list is displayed.

Option

By Clicking the [Option] in the menu or checking the buttons at the top of the window, you can set the parameters of koyori.

Application to Other Languages

In the default setting, koyori deals with Japanese- English bilingual corpus. But koyori can deal with other languages.

For using other languages, you should edit the configuration file "~/.koyorirc". If you change $language[0] = 'Japanese' into $language[0] = 'French' and restart koyori, you can deal with French-English corpus. If you need, you change the values of $user_dictionary_file[0] and $user_dictionary_file[1]. The character code of the corpus should be UTF-8.

Copyright

(C) Copyright 2005 OGAWA Yasuhiro

If you have any problems, please e-mail following address. mailto: koyori@kl.i.is.nagoya-u.ac.jp