Also,
http://www.isi.edu/~och/mkcls.html
works quite well.
On Tue, 11 May 2004, Tony Berber Sardinha wrote:
> Hi Murk
>
> (1) SImple chunker:
> -First, upload your corpus at http://lael.pucsp.br/corpora/enviar and obtain a
> password
> -Then go to http://lael.pucsp.br/corpora/ngrama/index.html, enter your password
> and cluster size, click on Fazer
> -See results
> (2) N-gram Statistics Package v.0.5 (by Ted Pedersen and Satanjeev Banerjee)
> -First, upload your corpus at http://lael.pucsp.br/corpora/enviar and obtain a
> password
> -Go to http://lael.pucsp.br/corpora/nsp/index.html, enter your password and
> other options, click on Fazer
> -See results
>
> If you're on Linux / Mac OSX / Unix / Cygwin I can send you a simple Unix Shell
> script for that.
>
> cheers
> tony.
> -------------------------------------
> Dr Tony Berber Sardinha
> LAEL, PUC/SP
> (Catholic University of Sao Paulo, Brazil)
> tony4@uol.com.br
> http://lael.pucsp.br/~tony
> [New website]
>
> ----- Original Message -----
> From: "Murk Wuite" <Murk@polderland.nl>
> To: <CORPORA@HD.UIB.NO>
> Sent: terça-feira, 11 de maio de 2004 04:24
> Subject: [Corpora-List] token clustering tool
>
>
> Dear all,
>
> Does anyone know of a tool (or algorithm), preferably available freely
> for research purposes, that takes as its input a corpus only and
> produces as its output clusters of tokens that occur close to each other
> relatively often?
>
> Best wishes,
>
> Murk Wuite
> MA student at the Department of Language and Speech, Katholieke
> Universiteit Nijmegen, The Netherlands
>
>
>
-- Hal Daume III | hdaume@isi.edu "Arrest this man, he talks in maths." | www.isi.edu/~hdaume
This archive was generated by hypermail 2b29 : Tue May 11 2004 - 15:57:22 MET DST