[Corpora-List] structured data (enu | csy) for IE needed

From: Filip Malik (filip.malik@centrum.cz)
Date: Thu Jan 25 2007 - 08:25:13 MET

  • Next message: Yannick Versley: "Re: [Corpora-List] structured data (enu | csy) for IE needed"

    Hello all,

    for my graduation theses, I need a set of structured data for some experiments:
    Data set should consists of XML files, HTML files or any of hypertext based files.
    Next requirement is: "highly structuded data". This means, that I'm not interested
    in data with structure such as next example has:
    <p>Paragraph, many words in same tag</p>
    I' looking for the data, that are more structured. Like this example:
    <t> <tag2>Few words (up to 10)</tag2> <tag3>Few words (up to 10)</tag3> </t>
    Last requirement is: English or Czech domain.

    I hope, that somebody, who reads Corpora was using similar data set, which
    could be reuse again. My goal is IE from hypertext by using content and structure
    of data.

    Thanks and regards,
    Filip Malik

    -fm



    This archive was generated by hypermail 2b29 : Thu Jan 25 2007 - 08:23:17 MET