Re: [Corpora-List] Japanese corpora

From: Marco Baroni (baroni@sslmit.unibo.it)
Date: Wed Jul 20 2005 - 13:17:34 MET DST

Next message: Przemek Kaszubski: "[Corpora-List] you vs. contractions (2)"

Previous message: Adam Kilgarriff: "RE: [Corpora-List] you vs. contractions"
In reply to: Vorontsov Alexander: "[Corpora-List] Japanese corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This is definitely not the ideal solution, but what we do is we download
Japanese "corpora" from the web (if you are interested, I can send you url
lists corresponding to documents in our corpus and tools to download the
corresponding docs), and we tokenize them/pos-tag them using ChaSen:

http://chasen.aist-nara.ac.jp/hiki/ChaSen/

Once you have a corpus tagged with ChaSen, you could use it to create
other resources (e.g., simple dictionaries of word/morphological features
pairs).

Regards,

Marco

Next message: Przemek Kaszubski: "[Corpora-List] you vs. contractions (2)"
Previous message: Adam Kilgarriff: "RE: [Corpora-List] you vs. contractions"
In reply to: Vorontsov Alexander: "[Corpora-List] Japanese corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Jul 20 2005 - 14:16:00 MET DST