[Corpora-List] XML annotation guidelines

From: Simpson, Rita (ritacsim@umich.edu)
Date: Fri Jun 06 2003 - 15:35:15 MET DST

  • Next message: Paul McNamee: "Re: [Corpora-List] corpus for Spanish and French language"

    > Dear Corporist Colleagues,
    >
    > We are in the process of converting our corpus of transcribed
    > academic speech from SGML to XML, and adding additional annotation.
    > Can anyone point us to some standards or (preferably) precedents
    > for XML-ized annotation of:
    >
    > 1) POS tagging
    > and
    > 2) pragmatic markup (e.g., text segments manually identified as 'narrative',
    > 'disagreement', 'request', etc.)
    >
    > Within the TEI guidelines (P4), we've found some suggestions for the POS
    > tagging, (but nothing yet for something like our pragmatic categories), e.g.
    >
    > <s type="sentence">
    > <w ana="at">The</w>
    > <w ana="nn1">victim</w>
    > <m ana="gen">'s</m>
    > <w ana="nn2">friends</w>
    > ...
    > </s>
    >
    > But somehow this seems a bit more verbose than it needs to be.
    > Is this format standard, or are there other XML-style annotation
    > formats in use?
    >
    > Thanks much for any leads. We'd especially appreciate getting
    > pointers to specific sections of the TEI guidelines that we may be
    > overlooking, or references to any user-friendly documentation
    > (other than the TEI) -- the XCES seems to be lacking in this
    > respect at present.
    >
    > Sincerely,
    >
    > Rita Simpson & the MICASE team
    > English Language Institute
    > University of Michigan
    >
    >



    This archive was generated by hypermail 2b29 : Fri Jun 06 2003 - 15:43:00 MET DST