Re: [Corpora-List] Looking for a XML to TEXT convertor/editor

From: Martin Wynne (martin.wynne@oucs.ox.ac.uk)
Date: Tue Nov 28 2006 - 10:37:21 MET

  • Next message: Notis Toufexis: "Re: [Corpora-List] Looking for a XML to TEXT convertor/editor"

    I'd use sed too, although I don't think Oliver's command will catch
    cases where there is a line break between the < and the >, so typically
    won't catch long comments in the markup, for example. If you run the
    following first:

    cat yourxmltext | grep "<" | grep -v ">" | less

    it should show any lines with just an opening "<", and alert you to the
    presence of any potential problems.

    Martin

    Oliver Mason wrote:
    > With sed it's even easier...
    >
    > cat yourxmltext | sed 's/<[^>]*>//g' > yourplaintext
    >
    > This removes everything in '<..>'; not as complete as Lou's earlier
    > suggestion regarding XSLT, but I guess it wins the prize for the
    > shortest solution...
    >
    > Oliver
    >
    > On 27/11/06, Daniel Zeman <zeman@ufal.mff.cuni.cz> wrote:
    >> If you have Perl on your machine (default on Linux), the attached Perl
    >> script could help you.
    >
    >

    -- 
    Martin Wynne
    Head of the Oxford Text Archive and
    AHDS Literature, Languages and Linguistics
    

    Oxford University Computing Services 13 Banbury Road Oxford UK - OX2 6NN Tel: +44 1865 283299 Fax: +44 1865 273275 martin.wynne@oucs.ox.ac.uk



    This archive was generated by hypermail 2b29 : Tue Nov 28 2006 - 10:35:09 MET