Re: Corpora: looking for trained WinBrill files

Tony Berber Sardinha (tony4@uol.com.br)
Wed, 10 Feb 1999 09:26:54 -0200

Hi,

the version I've installed is not winbrill but a DOS implementation (I
believe Takahashi's because it used GO32), and the quality looks good to
me. For example, this is the tagged output of part of your message:

From/IN the/DT first/JJ place/NN I/PRP got/VBD the/DT PC/NNP version/NN
of/IN the/DT tagger/NN I/PRP was/VBD looking/VBG
for/IN and/CC from/IN the/DT latter/NN I/PRP just/RB copied/VBD the/DT
lexicon/NN and/CC rule/NN files/NNS
(since/NN I/PRP didn't/VBD see/VB any/DT on/IN Takashi's/NNP site)./NNP
Now,/NNP the/DT problem/NN is/VBZ the/DT
fairly/RB low-quality/VB result/NN I/PRP obtain/VBP when/WRB trying/VBG
this/DT kit/NN even/RB on/IN simple/JJ
English/JJ phrases./NN Surprisingly/RB I/PRP get/VBP 'were'/NN
erroneously/RB tagged/VBN as/IN an/DT
NNP/NN in/IN so/RB many/JJ cases,/NN for/IN instance./NN I/PRP suppose/VBP
the/DT problem/NN is/VBZ caused/VBN
by/IN the/DT file/NN with/IN the/DT contextual/JJ rules/NNS which/WDT
came/VBD with/IN the/DT package./NN

The command line was:
tagger LEXICO~1.BRO temp.txt BIGRAMS LEXICA~1.BRO CONTEX~1.BRO >temp.out

The lexical rules and contextual rules are based on the Brown corpus and
came with the distribution, probably in the file
ftp://ftp.cs.jhu.edu/pub/brill/Programs/RULE_BASED_TAGGER_V.1.14.tar.Z

which you've downloaded.

If you need anything I'll be happy to help.

tony.
-------------------------------
Dr Tony Berber Sardinha
Catholic University of Sao Paulo, Brazil
tony4@uol.com.br
http://sites.uol.com.br/tony4/homepage.html
http://homepages.infoseek.com/~corpuslinguistics/homepage.html
-------------------------------

----------
> From: johan.hagman@jrc.it
> To: CORPORA@hd.uib.no
> Subject: Corpora: looking for trained WinBrill files
> Date: 10 February 1999 07:29
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> Hi, out there!
>
>
> I just got into this list and am trying it for the first time.
>
> Some time ago I dowloaded two Brill tagger packages
> (from http://member.nifty.ne.jp/htakashi/dos/ and
> ftp://ftp.cs.jhu.edu/pub/brill/Programs/RULE_BASED_TAGGER_V.1.14.tar.Z).
>
> >From the first place I got the PC version of the tagger I was looking
> for and from the latter I just copied the lexicon and rule files
> (since I didn't see any on Takashi's site). Now, the problem is the
> fairly low-quality result I obtain when trying this kit even on simple
> English phrases. Surprisingly I get 'were' erroneously tagged as an
> NNP in so many cases, for instance. I suppose the problem is caused
> by the file with the contextual rules which came with the package.
>
> Using the flag -i is said to give an intermediate file (which would
> help tracing the way the rules have been applicated and thereby faci-
> litate the modification of these or their order) but it doesn't work.
>
> My question to you is whether you know of any better version of these
> files (publicly available or "lendable" in exchange of my reference to
> whomever the credits are due) which give a more decent result.
>
> Time is a little too scarce for trainig the tagger myself. Any files
> which perform better than those in this training kit would be welcome!
>
> The languages of interest are ENG, FRE, and GER.
>
>
> Sorry for bothering you
> with this "novice" question
>
> Johan Hagman
> johan.hagman@jrc.it
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -