[Corpora-List] Summary: fast string replacement

From: Jörg Schuster (joerg.schuster@gmail.com)
Date: Tue Mar 15 2005 - 14:08:33 MET

  • Next message: Marie-Paule Jacques: "[Corpora-List] Dernier appel : JETOU 2005"

    Hello,

    thanks to all who participated in this discussion.

    First I have to apologize for my original posting (or mail?): I asked
    for programs for transducing strings. I wrote 'strings (!)' to
    indicate that I really meant strings (and not regular expressions or
    tokens). Yet, the examples I gave mislead some people because they did
    not include cases of transduction of multi word lexemes.

    In the remainder of this paper I will give an overview of the
    suggested solutions. The solution that I like best is Paul Bijnens' C
    program (12).

    For shortness, I will mostly leave away the names of the people who pointed
    me to the sites.

    (1) Max Silberztein: http://www.nyu.edu/pages/linguistics/intex/
    (2) Helmut Schmid: http://www.ims.uni-stuttgart.de/~schmid
    (3) Stephan Kanthak:
    http://www-i6.informatik.rwth-aachen.de/~kanthak/fsa.html
    (4) Gertjan van Noord: http://grid.let.rug.nl/~vannoord/Fsa/fsa.html
    (5) Arnaud Adant: http://membres.lycos.fr/adant/tfe/
    (6) ISI: http://www.isi.edu/licensed-sw/carmel/
    (7) MIT: http://people.csail.mit.edu/people/ilh/fst/

    Comments: (1)-(6) all look like really serious programs. Yet, I
    considered them to be too complicated for my purposes.

    (7) is not available at the moment.

    (8) ?: ftp://ftp.gnu.org/non-gnu/flex/
          Comment: good, but overkill for my purposes.

    (9) Songlin Piao pointed me to a java tool of his:
          http://www.lancs.ac.uk/staff/piaosl/research/download/download.htm. I

          Comment: I tried to use it, but it did not work:
          $ java -jar mlct_concordance.jar
          $ Invalid or corrupt jarfile mlct_concordance.jar

    (10) Leif Arda Nielsen gave me the advice to use sed.
            Comment: too slow.

    (11) Damon Allen Davison gave me the advice to use SQL.
            Comment: I did not quite understand Damon's mail.

    (12) Paul Bijnens pointed me to a c program of his:
            http://torvald.aksis.uib.no/corpora/repl.zip
            Comment: This program is great.
            - It worked immediately. (No fumbling around with paths,
               (versions of) compilers and the like.)
            - It doesn't seem to care about the size of the rewrite
              dictionary (except that you need to have enough RAM, of course)
            - It is quite fast: I gave it a rewrite dictionary of 1 million
              entries. It transduced about 50MB per minute on an Athlon 2600+.

    Jörg Schuster



    This archive was generated by hypermail 2b29 : Tue Mar 15 2005 - 14:03:33 MET