Re: [Corpora-List] Looking for a XML to TEXT convertor/editor

From: Notis Toufexis (notis.toufexis@gmail.com)
Date: Tue Nov 28 2006 - 10:52:45 MET

  • Next message: Martin Wynne: "Re: [Corpora-List] Dictionaries/Lexical Databases"

    This one is for all who are not into sed, perl etc.

    Jedit's (Java based text editor, www.jedit.org) XML plugin has a "Remove all
    tags" command.

    It might win the prize for the fastest way to do it, too.

    Notis

    On 11/28/06, Martin Wynne <martin.wynne@oucs.ox.ac.uk> wrote:
    >
    > I'd use sed too, although I don't think Oliver's command will catch
    > cases where there is a line break between the < and the >, so typically
    > won't catch long comments in the markup, for example. If you run the
    > following first:
    >
    > cat yourxmltext | grep "<" | grep -v ">" | less
    >
    > it should show any lines with just an opening "<", and alert you to the
    > presence of any potential problems.
    >
    > Martin
    >
    > Oliver Mason wrote:
    > > With sed it's even easier...
    > >
    > > cat yourxmltext | sed 's/<[^>]*>//g' > yourplaintext
    > >
    > > This removes everything in '<..>'; not as complete as Lou's earlier
    > > suggestion regarding XSLT, but I guess it wins the prize for the
    > > shortest solution...
    > >
    > > Oliver
    > >
    > > On 27/11/06, Daniel Zeman <zeman@ufal.mff.cuni.cz> wrote:
    > >> If you have Perl on your machine (default on Linux), the attached Perl
    > >> script could help you.
    > >
    > >
    >
    >
    > --
    > Martin Wynne
    > Head of the Oxford Text Archive and
    > AHDS Literature, Languages and Linguistics
    >
    > Oxford University Computing Services
    > 13 Banbury Road
    > Oxford
    > UK - OX2 6NN
    > Tel: +44 1865 283299
    > Fax: +44 1865 273275
    > martin.wynne@oucs.ox.ac.uk
    >
    >
    >

    -- 
    http://www.early-modern-greek.org
    http://www.mml.cam.ac.uk/greek/grammarofmedievalgreek/
    



    This archive was generated by hypermail 2b29 : Tue Nov 28 2006 - 10:50:14 MET