Re: Japanese Concordancers?

Philip Bernick (pbernick@crl.nmsu.edu)
Mon, 29 Jul 1996 14:22:42 -0600 (MDT)

On Wed, 24 Jul 1996, Vincent Ooi wrote:

> Does anyone know whether there are (free/shareware) concordancers which=
=20
> are able to handle Japanese text (kana & kanji characters)?

The Computing Research Laboratory has software for Unix and=20
X-Windows machines called X-Concord that will do this. =20

The X-Concord program is a concordance tool that allows KWIC (Key Word In
Context) searches to be done in text in as many as 17 languages. It is
designed to be easy to work with so that teachers and students can use
X-Concord in the classroom to identify relevant texts by viewing words and
expressions in context.=20

Searching is quick, and the size of the corpus is limited only by
available disk space. Using an implementation of the Boyer-Moore search
algorithm specially adapted for wide characters, X-Concord can search at
over 1MB per second, eliminating the need for pre-indexing on many
moderate scale corpora.=20

Searching is very flexible. Users can match any string with any part of a
word or phrase. Users can also limit the search to only those concordances
either containing or missing specified strings in the context to the left
or right of the keyword. X-Concord shows the results in a KWIC display
and also displays the complete sentence for the selected KWIC line. The
complete document is displayed in yet another window. Easy methods for
saving individual sentences or complete documents to new text files are
provided. The users can then edit these files or use X-Concord to print
the results.=20

Highlights of X-Concord

=B7=09Supports documents conforming to the Tipster Document Architecture as
=09well as flat text files. =20

=B7=09Searches multilingual text.=20

=B7=09Supports around 17 different languages for both input and
=09display.=20

=B7=09Any string can be specified as the "keyword."=20

=B7=09The search reports its status periodically to the user interface to
=09allow presentation of the search progress. =20

=B7=09The search can be constrained by specifying strings that may or may n=
ot
=09appear in the context surrounding the "keyword."=20

=B7=09The size of the context surrounding the "keyword" is configurable for
=09both the left and right sides of the "keyword."=20

=B7=09The number of successful "hits" before ending the search is
=09configurable.=20

Implementation

X-Concord can be ran as a stand-alone program or integrated with other CRL
applications like Oleada or Cibola. Internally, the search engine of
X-Concord is designed to support either a command line or programmatic
interface. X-Concord itself uses the programmatic interface, but the
search engine is an independent module.=20

Other programs can incorporate the search capabilities of X-Concord either
by using the search engine as a subroutine library or by invoking the
command line version of the program.=20

Some of the input types supported are:

Arabic
Armenian
Cyrillic
Ethiopic
Farsi/Pashto
Georgian
Hebrew
Japanese
Korean
Lao
Latin-1
SerboCroat
Simplified Chinese
Thai
Traditional Chinese
Vietnamese

For more information please contact=20

Bill Ogden ogden@crl.nmsu.edu
or Philip Bernick pbernick@crl.nmsu.edu
The Computing Research Laboratotory
New Mexico State University
Box 30001, Dept. 3CRL
Las Cruces, NM 88003

505.646.5466