Corpora: NLP Pre-processing suite

R.J.Evans (in6087@ccug.wlv.ac.uk)
Mon, 7 Dec 1998 11:14:33 GMT

Hi everyone,

I'm looking to download a collection of tools for extracting
information and formatting corpora prior to running my own programs on
them.

In particular, I'd like to find:

1. A PoS Tagger (also returning person, gender, and number information),
2. A Sentence Splitter,
3. A Tokenizer,
4. A NP-Extractor,
5. A Parser.

for pre-processing English instruction manuals (I've noticed
that some robust parsers aren't geared for imperative sentences).

If anyone has any recommendations, I'd be delighted to hear them. I'll
post the results as soon as I have them.

With thanks in advance,

|-------------------------------------------|
| Richard Evans |
|-------------------------------------------|
| Computational Linguistics Research Group, |
| School of Languages and European Studies, |
| University of Wolverhampton, |
| UK. |
|-------------------------------------------|