Re: [Corpora-List] XML encoding database of tagged documents

From: Lou (lou.burnard@computing-services.oxford.ac.uk)
Date: Mon Jun 05 2006 - 21:29:02 MET DST

  • Next message: Holger Wunsch: "Re: [Corpora-List] German names, surnames and geonames"

    I know of no application of TEI which uses more than a very small
    proportion of the 600+ elements it defines in total (probably a bit more
    than 1%, but certainly less than 10!). The point about the TEI standard
    is that it is designed to be modular and customisable, so that you can
    use it to develop interchangeable resources. If I've understood your
    intended application right, you're talking about a kind of standoff
    annotation, which would allow you to create pseudo documents consisting
    of pointers into a separate text file: this is what the <span> element
    provides (probably not <milestone>s, since they are embedded within the
    text itself. A document containing such pointers is still, I think, a
    text document, and so can be described by a suitable subset of TEI.

    However, we probably shouldn't burden readers of this list with a
    theological debate! If you'd like to send me a sample of the kind of
    thing you have in mind, I'd be glad to make more concrete suggestions
    off list.

    Another XML based standard you might consider in this context is topic
    maps which perform a similar kind of annotation function.

    best wishes

    Lou

    Normand Peladeau wrote:

    > Well! TEI is a great standard but is much more that what I need.
    > Maybe 99% of what they propose would not be very useful for the kind
    > of application I am trying to do.
    >
    > I don't need to keep information about the text structure or about
    > linguistic or typographic features. The only element that I need to
    > keep inside the documents are user defined codes attached to text
    > segments. Those codes can be overlapping (the "milestone" element
    > proposed by TEI may offer a solution for this, but I'm not entirely
    > sure it handles all the situations pretty well, so some tests will be
    > needed). As for comments, they are not attached to the document itself
    > but to the user defined codes, so I'm not sure they are equivalent to
    > TEI <note> element.
    >
    > I have some clients in the market research industry and in legal firms
    > who are doing manual annotations of documents in databases and are not
    > at all interested in the kind of information normally provided by a
    > TEI compliant document. What I am looking for is a more basic set of
    > XML standards that are used to import and export database containing
    > documents (but also numercial data, dates, etc.) and where the only
    > relevant elements in the documents are the user defined codes attached
    > to text segments (sometimes overlapping).
    >
    > Normand
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Mon Jun 05 2006 - 21:29:08 MET DST