Re: [Corpora-List] structured data (enu | csy) for IE needed

From: Yannick Versley (versley@sfs.uni-tuebingen.de)
Date: Thu Jan 25 2007 - 09:44:48 MET

  • Next message: nigel bruce: "[Corpora-List] Customisable Concordancer for ESP (Law) Learners?"

    Hello,

    > for my graduation theses, I need a set of structured data for some
    > experiments: Data set should consists of XML files, HTML files or any of
    > hypertext based files. Next requirement is: "highly structuded data". This
    > means, that I'm not interested in data with structure such as next example
    > has:
    > <p>Paragraph, many words in same tag</p>
    > I' looking for the data, that are more structured. Like this example:
    > <t> <tag2>Few words (up to 10)</tag2> <tag3>Few words (up to 10)</tag3>
    > </t> Last requirement is: English or Czech domain.
    My guess would be that Wikipedia fits your description, where you will find
    many tables and/or templates, and it is available in English and Czech. I
    don't know if anyone has tried extracting specific information from that,
    though.

    Best,
    Yannick

    -- 
    Yannick Versley
    Seminar für Sprachwissenschaft, Abt. Computerlinguistik
    Wilhelmstr. 19, 72074 Tübingen
    Tel.: (07071) 29 77352
    



    This archive was generated by hypermail 2b29 : Thu Jan 25 2007 - 09:42:36 MET