[Corpora-List] Cost of POS tagging, again

From: Kevin B. Cohen (kevin.cohen@gmail.com)
Date: Wed Dec 27 2006 - 00:48:02 MET

Next message: Elisabete Marques Ranchhod: "[Corpora-List] Deadline Extension - Special Issue of Lingvisticae Investigationes on Named Entities"

Previous message: Tina Waldman: "[Corpora-List] corpua of academic articles"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi, Marc et al.,

Christopher's points are well-made. A couple of other things to think
about:

1) You seem to be envisioning doing ex nihilo manual POS annotation.
However, that will probably be neither practical nor desirable; rather,
you're likely to want to do the initial annotation automatically, and then
manually curate the output of the initial, automatically-generated
annotation step.
2) You actually may not want to directly curate the POS tagging at all.
Rather, if you're going to do further processing--say, syntactic
parsing--you might want to curate the POS tags as part or byproduct of the
downstream curation.
3) Even if you do want to directly curate the POS tagging, you will probably
find some efficiencies to be gained from automatic means. For example, you
are more likely to need to correct a bunch of adjective/past participle
distinctions (I'm assuming here that your data is English) than you are to
need to correct a bunch of mis-tagged commas (although I have certainly seen
lots of mis-POS-tagged commas, too!). Scripting can help you out here.

Finally, Christopher is right on with suggesting hourly, rather than
per-token, budgeting.

Hope this is helpful,

Kevin

-- 
K. B. Cohen
Biomedical Text Mining Group Lead
Center for Computational Pharmacology
303-916-2417 (cell) 303-377-9194 (home)
http://compbio.uchsc.edu/Hunter_lab/Cohen

Next message: Elisabete Marques Ranchhod: "[Corpora-List] Deadline Extension - Special Issue of Lingvisticae Investigationes on Named Entities"
Previous message: Tina Waldman: "[Corpora-List] corpua of academic articles"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Dec 27 2006 - 12:40:53 MET