Re: [Corpora-List] Looking for a XML to TEXT convertor/editor

From: Lou (lou.burnard@computing-services.oxford.ac.uk)
Date: Mon Nov 27 2006 - 10:40:23 MET

  • Next message: ELDA: "[Corpora-List] ELRA - Language Resources Catalogue - Update"

    The easiest way to do this properly, provided the files are well formed
    xml, is to use an xslt stylesheet.

    A completely empty stylesheet will, by default, simply give you the text
    content of the input XML.

    Try this:

    1. download and install an XSLT processor such as xsltproc

    2. create a file like the following

    -------------------

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                    version="1.0">

      <xsl:output method="text" encoding="utf-8" />

      <xsl:template match="teiHeader"/>

      <xsl:template match="text">
        <xsl:apply-templates/>
      </xsl:template>

    </xsl:stylesheet>

    -----------------

    In this example, the content of any <teiHeader> element in the input
    will be suppressed, and the content of any <text> element will be passed
    through. If your document uses different names for the elements, you can
    edit the above as needed.

    Run the script by referencing it from your XML files with a <?stylesheet
    command or, more easily, by using a standalone processor such as xsltproc:

    xsltproc mystylesheet.xsl myinputfile > myoutputfile

    Federica Barbieri wrote:
    > Dear List Members,
    >
    >
    > For my dissertation research, I will need to convert several corpus files in
    > XML format into TEXT, so that I can process these files with some of the
    > programs for linguistic analysis that we have here at NAU, all of which are
    > designed to process text files (with line breaks).
    >
    > So, I am looking for a good, user-friendly XML to TEXT convertor or editor and
    > was wondering if anyone knows of any or has used any that they would
    > recommend.
    >
    > So far I've tried to use the XML FoxAdvance (available at
    > http://xmlfox.com/index.htm). However I've had no luck with the trial version
    > of this program and the support has been unhelpful (they suggested that I try
    > some other product by some of their competitors...).
    >
    > I would appreciate any suggestions and I will post a summary if there is
    > interest.
    >
    > Thanks!
    >
    > Federica Barbieri
    >
    > *****************
    > Federica Barbieri
    > PhD Candidate in Applied Linguistics
    > Department of English
    > Northern Arizona University
    > Liberal Arts Building, BOX 6032
    > Flagstaff, AZ 86011-6032
    >
    > Office: BAA 322
    > Tel: (928) 523 0291
    > Fax: (928) 523 7074
    > email: Federica.Barbieri@NAU.EDU
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Mon Nov 27 2006 - 10:38:52 MET