Re: [Corpora-List] Looking for a XML to TEXT convertor/editor

From: Serge HEIDEN (Slh@ens-lsh.fr)
Date: Tue Nov 28 2006 - 19:06:03 MET

  • Next message: Mario Poe: "[Corpora-List] Spanish corpora and pos-taggers"

    Le Tuesday, November 28, 2006 4:13 PM [GMT+1=CET],
    Alexandre Rafalovitch <arafalov@gmail.com> a écrit :

    >> Processing anything even a tiny bit more complex requires a big jump
    >> in XML specific rules and workarounds. Solutions designed
    >> specifically for XML are much better in the long run.

    My mention of textonly, from the LT XML toolkit, was in the
    same spirit. Being based on a native SGML and XML toolkit, textonly
    can deal with some XML specificities. See for example some of its options.
    Excerpt of 'man textonly' :

    usage: textonly [-d ddb-file] [-u base-url] [-t tag] [-s c]
         [-x] [file]

        -t <tag>
              If specified only text inside <tag ...> ... </tag> is
              printed. <tag> is the name of an SG/XML element.

         -s <str>
              If present, the STRING <str> (e.g. ' ' or "\^J") is
              printed between each bit of text.

         -x If present, expand internal SDATA and numerical charac-
              ter references.

    Now, XmlStarlet being based on libxml2, it must be very robust
    to XML specificities and various extensions.

    Best,

    -S
    _____________________________________________________________
    Serge Heiden, slh@ens-lsh.fr, https://weblex.ens-lsh.fr
    ENS-LSH/CNRS - ICAR UMR5191, Institut de Linguistique Française
    15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883



    This archive was generated by hypermail 2b29 : Tue Nov 28 2006 - 19:03:31 MET