Re: Corpora: lemma vs lexeme

Kenneth W. Church (kwc@research.att.com)
Fri, 05 Nov 1999 08:25:52 -0500

Tony is of course quite right. There was a lot of activity on part of speech
tagging in the late 80's and several taggers from that period (including my own)
are still being used quite a bit. A number of additional taggers have become
available since that time. It is hard to say that the newer taggers are better
than what came before. Evaluation is quite tricky. None of these taggers work
as well as we would like, but the standard evaluation methods are hard pressed to
say that one tagger is much better than another. Some taggers claim to be more
accurate than others and some don't. The jury is still out on the accuracy
question.

It shouldn't be too hard to find plenty of discussion of work in the 80's. Almost
any web search engine will find something. A paper based on Steven DeRose's
thesis appeared in Computational Linguistics, which should be easy to find.
Almost every major conference proceedings (e.g., Coling, ACL) since the late 80's
have a couple of papers referencing part of speech taggers, and many of these
references point back to the 80's.

- ken

"Mcenery, Tony" wrote:

> Hi Paul
>
> > I did a Ph.D. with Sinclair at Birmingham in the early 90's which
> > revolved around this topic. At that time, there were no efficient POS
> > taggers and so lemmatization could not be carried out on a POS tagged
> > text.
> [Mcenery, Tony]
> Sorry for the anglo-centric joke, but I must say "Shome mishtake
> shurely"? There were, I have good reason to believe, efficient POS taggers
> available for English from the mid-80s. Automated lemmatisation of English was
> also being undertaken around that time (at least). Sorry to nitpick.
>
> Tony