Re: [Corpora-List] fast string replacement

From: Jörg Schuster (joerg.schuster@gmail.com)
Date: Fri Mar 11 2005 - 17:17:49 MET

  • Next message: Damon Allen Davison: "Re: [Corpora-List] fast string replacement"

    > Two further questions:
    >
    > - What exactly do you mean by "fast"?

    I mean really REALLY fast. The size of my rewriting dictionary is 1
    million lines at the moment. (But it will grow larger). The size of my
    corpus is 80GB. And I would like to be able to tag often.

    > - Do you mean string replacement (arbitrary substrings in a line of
    > text) or word replacement?

    String replacement. I use to make the dictionary such that only true
    lexemes are tagged -- be they single words or multi word units.

    > Schmid's FST toolkit (see http://www.ims.uni-stuttgart.de/~schmid) and
    > Steve Abney's cascaded parser CASS (you'll have to search Google for
    > the source code).

    I will try this. Thank you.

    Jörg Schuster



    This archive was generated by hypermail 2b29 : Mon Mar 14 2005 - 10:56:30 MET