Corpora: Results: NLP Pre-processing Suite

Richard Evans (in6087@wlv.ac.uk)
Thu, 10 Dec 1998 11:07:35 +0000

Hi everyone,

Just a note to thank everyone who replied to my query:
______________________________________________________________________________

I'm looking to download a collection of tools for extracting
information and formatting corpora prior to running my own programs on
them.

In particular, I'd like to find:

1. A PoS Tagger (also returning person, gender, and number information),

2. A Sentence Splitter,
3. A Tokenizer,
4. A NP-Extractor,
5. A Parser.

for pre-processing English instruction manuals (I've noticed
that some robust parsers aren't geared for imperative sentences).

If anyone has any recommendations, I'd be delighted to hear them. I'll
post the results as soon as I have them.
______________________________________________________________________________

In case there's anyone else who ISN'T aware of the range of software
being
used, a summary of replies follows.

===A range of tools is available from the ever helpful Oliver Mason:

______________________________________________________________________________

//\\ computer officer | corpus research | department of english | school
of -
//\\ humanities | university of birmingham | edgbaston | birmingham b15
2tt -
\\// united kingdom | phone +44-(0)121-414-6206 | fax
+44-(0)121-414-5668/\ -
\\// mobile 07050 104504 | http://www-clg.bham.ac.uk |
o.mason@bham.ac.uk\/ -

______________________________________________________________________________

===Several respondents (
Chris Brew <Chris.Brew@edinburgh.ac.uk>,
Colin Matheson <colin@cogsci.ed.ac.uk>,
Simone Teufel <simone@cogsci.ed.ac.uk>
)
mentioned the
tools available from The University of Edinburgh's Language
Technology
Group at:
______________________________________________________________________________

http://www.ltg.ed.ac.uk/software
______________________________________________________________________________

===Annette Preissner <noemi@dfki.de> indicated software at:
______________________________________________________________________________

http://www.lpl.univ-aix.fr/projects/multext/
______________________________________________________________________________

===Max Schulze <bschulze@xis.xerox.com> directed me to the tools at the
Xerox Research Center Europe. The contact there is Ken Beesley.
______________________________________________________________________________

Ken.Beesley@xrce.xerox.com
______________________________________________________________________________

===Pasi Tapanainen <Pasi.Tapanainen@conexor.fi> indicated that all but
the
parser are available (on what looks like a commercial basis) from:
______________________________________________________________________________

http://www.conexor.fi/info-tools.html
______________________________________________________________________________

===Atro Voutilainen <voutilai@ling.helsinki.fi> and Pasi Tapanainen
showed me
a parser geared for imperative sentences. When testing the demo, the
visual
FDG version looked interesting, but the output seemed to consist of
blue
and red balls rather than syntactic symbols. The sample analysis
looked
good though.
______________________________________________________________________________

http://www.conexor.fi/analysers.html
______________________________________________________________________________

===Thorsten Brants <thorsten@CoLi.Uni-SB.DE> offered a Part of Speech
tagger
at:
______________________________________________________________________________

http://www.coli.uni-sb.de/~thorsten/tnt/
______________________________________________________________________________

Thank you one and all,

_____________________________________________
| |
| Richard Evans |
|___________________________________________|
| Computational Linguistics Research Group, |
| School of Languages and European Studies, |
| University of Wolverhampton, |
| UK. |
|___________________________________________|