Re: [Corpora-List] Looking for a XML to TEXT convertor/editor

From: Alexandre Rafalovitch (arafalov@gmail.com)
Date: Tue Nov 28 2006 - 16:13:50 MET

  • Next message: Oliver Mason: "Re: [Corpora-List] Looking for a XML to TEXT convertor/editor"

    It seems to me that we are descending into methods that cater for less
    and less for edge cases possible with XML. Certainly, sed or even perl
    would only work if the XML encoding is most primitive (e.g. tags with
    no elements only, no named entities, etc.). Processing anything even a
    tiny bit more complex requires a big jump in XML specific rules and
    workarounds. Solutions designed specifically for XML are much better
    in the long run.

    I have initially recommended XMLStarlet as a more comprehensive
    solution, but given other options, I will show how to use it to just
    do tag stripping while still taking into account XML special cases:
    <location_xmlstarlet>\xml sel -T -t -m / -v . xmlfile.xml

    Regards,
       Alex.

    On 11/28/06, Notis Toufexis <notis.toufexis@gmail.com> wrote:
    > This one is for all who are not into sed, perl etc.
    >
    > Jedit's (Java based text editor, www.jedit.org) XML plugin has a "Remove all
    > tags" command.
    >
    > It might win the prize for the fastest way to do it, too.



    This archive was generated by hypermail 2b29 : Tue Nov 28 2006 - 16:11:43 MET