Maria Esteva kirjoitti 10. jan. 2007 kello 22.02:
> Dear all,
>
> I am wondering if somebody knows of a program that will recognize
> and sort large sets of files according to language.
My experience is that a file certainly may contain different
languages. For our work, we identify language down to the paragraph
level, although we would often like to be as fine-grained as sentence
level.
We use text_cat, cf.
http://www.let.rug.nl/~vannoord/TextCat/
and have very good experiences.
Trond.
----------------------------------------------------------------------
Trond Trosterud t +47 7764 4763
Institutt for språkvitskap, Det humanistiske fakultet m +47 950 70140
N-9037 Universitetet i Tromsø, Noreg f +47 7764 5216
Trond.Trosterud (a) hum.uit.no http://www.hum.uit.no/a/trond/
----------------------------------------------------------------------
This archive was generated by hypermail 2b29 : Thu Jan 11 2007 - 18:08:30 MET