Hi Murk
(1) SImple chunker:
-First, upload your corpus at http://lael.pucsp.br/corpora/enviar and obtain a
password
-Then go to http://lael.pucsp.br/corpora/ngrama/index.html, enter your password
and cluster size, click on Fazer
-See results
(2) N-gram Statistics Package v.0.5 (by Ted Pedersen and Satanjeev Banerjee)
-First, upload your corpus at http://lael.pucsp.br/corpora/enviar and obtain a
password
-Go to http://lael.pucsp.br/corpora/nsp/index.html, enter your password and
other options, click on Fazer
-See results
If you're on Linux / Mac OSX / Unix / Cygwin I can send you a simple Unix Shell
script for that.
cheers
tony.
-------------------------------------
Dr Tony Berber Sardinha
LAEL, PUC/SP
(Catholic University of Sao Paulo, Brazil)
tony4@uol.com.br
http://lael.pucsp.br/~tony
[New website]
----- Original Message -----
From: "Murk Wuite" <Murk@polderland.nl>
To: <CORPORA@HD.UIB.NO>
Sent: terça-feira, 11 de maio de 2004 04:24
Subject: [Corpora-List] token clustering tool
Dear all,
Does anyone know of a tool (or algorithm), preferably available freely
for research purposes, that takes as its input a corpus only and
produces as its output clusters of tokens that occur close to each other
relatively often?
Best wishes,
Murk Wuite
MA student at the Department of Language and Speech, Katholieke
Universiteit Nijmegen, The Netherlands
This archive was generated by hypermail 2b29 : Tue May 11 2004 - 15:41:49 MET DST