Corpora: New Parser

From: Gojol (gojol@sunu.rnc.ro)
Date: Wed Dec 06 2000 - 14:51:00 MET

Next message: Kristen Precht: "Corpora: corpus of AAVE"

Previous message: Jean Veronis: "Corpora: Book: Prosody (Merle Horne)"
Next in thread: Kristen Precht: "Corpora: corpus of AAVE"
Reply: Kristen Precht: "Corpora: corpus of AAVE"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dear Colleagues ,

Those interested in a new parser ( based on an original
philosophy ) , shortly introduced below , are invited to
contact me personally ( gojol@sunu.rnc.ro ) . Any sugges-
tions , comparisons with existing parsers etc. will be wel-
come . Thank you ,
Vlad V. Gojol

............................................................

   After learning from a 46,000 words pos-tagged corpus and
a 32,000 words parsed ( treebank ) corpus , a 2,000 words
text ( not included in any of the two corpora ) is parsed
( tagging excluded ) in 18 seconds ( on a 200 MHz machine )
with 4% incomplete trees ( but for these declared failures ,
are also provided well formed trees sufficient for a subse-
quent translator ) - the extracted grammar having cca 12,000
rules . The Negra corpus of German is used . After learning
from a 17,000 words parsed corpus and from the same 46,000
words pos-tagged one , a 2,000 words text included into the
first ( but excluded from the second ) , to warrant that the
grammar is complete relative to it ( i.e. contains all the
rules necessary for its correct parsing ) , is processed in
4 seconds with no incomplete tree - the extracted grammar
having cca 7,000 rules . The parsing is 2-3 times slower on
the English corpus Susanne . The system is language indepen-
dent , with wide character support .
   The parser may accept a set of rules intended to refine
the statistical grammar deduced from the corpus . Moreover ,
it can take as input only a context-free grammar ( in which
case it ceases to be a statistical parser ) , but in this
operating mode it requires much time and memory ( during the
learning , not during the parsing as such ) if the grammar
is over-dimensioned . The statistical grammar is refined not
by simply adding the proposed rules , but by modifying the
corpus , to exploit all the real contexts possible for them .

Next message: Kristen Precht: "Corpora: corpus of AAVE"
Previous message: Jean Veronis: "Corpora: Book: Prosody (Merle Horne)"
Next in thread: Kristen Precht: "Corpora: corpus of AAVE"
Reply: Kristen Precht: "Corpora: corpus of AAVE"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Dec 06 2000 - 12:47:38 MET