Re: [Corpora-List] language sort

From: Trond Trosterud (trond.trosterud@hum.uit.no)
Date: Thu Jan 11 2007 - 18:09:57 MET

  • Next message: Adam Przepiorkowski: "[Corpora-List] Polish corpora session at PALC 2007"

    Maria Esteva kirjoitti 10. jan. 2007 kello 22.02:

    > Dear all,
    >
    > I am wondering if somebody knows of a program that will recognize
    > and sort large sets of files according to language.

    My experience is that a file certainly may contain different
    languages. For our work, we identify language down to the paragraph
    level, although we would often like to be as fine-grained as sentence
    level.

    We use text_cat, cf.
    http://www.let.rug.nl/~vannoord/TextCat/
    and have very good experiences.

    Trond.

    ----------------------------------------------------------------------
    Trond Trosterud t +47 7764 4763
    Institutt for språkvitskap, Det humanistiske fakultet m +47 950 70140
    N-9037 Universitetet i Tromsø, Noreg f +47 7764 5216
    Trond.Trosterud (a) hum.uit.no http://www.hum.uit.no/a/trond/
    ----------------------------------------------------------------------



    This archive was generated by hypermail 2b29 : Thu Jan 11 2007 - 18:08:30 MET