[Corpora-List] Searching Japanese corpora

From: Eric J. M. Smith (eric.smith@utoronto.ca)
Date: Thu Dec 21 2006 - 00:35:09 MET

  • Next message: Doug Cooper: "Re: [Corpora-List] Question concerning audio file search"

    Greetings,

    Following up on our recent thread about grep with Unicode, I'm curious
    about how people search for text in Japanese-language corpora.

    My understanding of Japanese is rudimentary, but is it not possible
    (potentially at least) for the same text to be written in hiragana,
    katakana, or kanji? In order to find all occurrences of a particular
    string in a corpus, would I have to do the search 3 times, once for
    each script? I assume that would be the case for something like grep.
     But are there more sophisticated query tools which abstract away the
    question of which script is actually used for data within the corpus?

    Thanks,

    Eric J. M. Smith
    Dept. of Linguistics
    University of Toronto



    This archive was generated by hypermail 2b29 : Thu Dec 21 2006 - 00:32:48 MET