Hello,
> for my graduation theses, I need a set of structured data for some
> experiments: Data set should consists of XML files, HTML files or any of
> hypertext based files. Next requirement is: "highly structuded data". This
> means, that I'm not interested in data with structure such as next example
> has:
> <p>Paragraph, many words in same tag</p>
> I' looking for the data, that are more structured. Like this example:
> <t> <tag2>Few words (up to 10)</tag2> <tag3>Few words (up to 10)</tag3>
> </t> Last requirement is: English or Czech domain.
My guess would be that Wikipedia fits your description, where you will find
many tables and/or templates, and it is available in English and Czech. I
don't know if anyone has tried extracting specific information from that,
though.
Best,
Yannick
-- Yannick Versley Seminar für Sprachwissenschaft, Abt. Computerlinguistik Wilhelmstr. 19, 72074 Tübingen Tel.: (07071) 29 77352
This archive was generated by hypermail 2b29 : Thu Jan 25 2007 - 09:42:36 MET