Corpora: A multilingual-supportive program

From: Songlin Piao (s.piao@dcs.shef.ac.uk)
Date: Tue Jun 05 2001 - 00:40:07 MET DST

  • Next message: Claire Cowie: "Corpora: PhD Studentship in corpus linguistics"

    Hi,

    A Java multilingual-supportative program with a graphical interface for searching regular expression, text encoding conversion and sentence/paragraph/title delimitation. is downloadable from my websie: http://www.dcs.shef.ac.uk/~piao/Research/DownLoad/download.htm

    In addition to Unicode, it can read text written in numerous encodings. With unicode font, it can display many languages.

    The sentence/paragraph spliting function works quite well on irregular text formats except missing of punctuation marks. It has been tested only on English/Chinese/Korean texts, but it should work on other languages using same punctuation marks as English.

    For details, please have a look at the webpage.

    Scott PIao
    ------------------------------------
    Dept. of Computer Science
    University of Sheffield
    Email: s.piao@dcs.shef.ac.uk



    This archive was generated by hypermail 2b29 : Mon Jun 04 2001 - 16:30:35 MET DST