[Corpora-List] BOUNCE corpora@lists.uib.no: Non-member submission from [David Reitter <dreitter@inf.ed.ac.uk>] (fwd)

From: Knut Hofland (knut@aksis.uib.no)
Date: Thu Jul 14 2005 - 16:23:31 MET DST

  • Next message: Suzan Verberne: "[Corpora-List] tools for QLF?"

    From: David Reitter <dreitter@inf.ed.ac.uk>
    Subject: Re: [Corpora-List] Communicator corpora parsed?
    Date: Thu, 14 Jul 2005 14:13:59 +0100
    To: corpora@hd.uib.no
    X-Mailer: Apple Mail (2.733)
    X-Provags-ID: kundenserver.de abuse@kundenserver.de login:f3c9a04d49beab9fcce37ffcb55ebfb9
    X-checked-clean: by exiscan on rolf
    X-Scanner: dcaa7fd1c863bbb41df6d4b6c9b93a92 http://tjinfo.uib.no/virus.html
    X-UiB-SpamFlag: NO UIB: -7 hits, 8.0 required
    X-UiB-SpamReport: spamassassin found;
      -7.0 Asked for it

    I received two replies to my earlier question regarding the =20
    availability of syntactic annotations of the DARPA Communicator =20
    corpus and of other spoken dialogue corpora.
    Both Sandra K=FCbler at T=FCbingen and Detmar Meurers at Ohio State =20
    recommended the Verbmobil treebanks, which contain spoken dialogue in =20=

    German, English and Japanese. They are available via

    http://www.phonetik.uni-muenchen.de/Bas/BasHomeeng.html

    A newer version of the German treebank is in preparation.

    As a side note: many (if not most) of the non-canned, spontaneous =20
    speech in Communicator consists of very short utterances. In =20
    contrast, the Maptask corpus (developed here at HCRC, Edinburgh; =20
    spoken human-human dialogue) has a lot to offer in terms of syntax

    Thanks for the replies.

    > is anyone aware of syntactic annotations of the (e.g. DARPA) =20
    > Communicator corpus, or similar large, task-oriented human/machine =20
    > or human/human dialogue corpora?
    > I'm looking for tree structures, and atomic categories such as VP =20
    > or PP would do just fine. I could work with non-perfect (i.e. =20
    > machine- parsed) annotations.
    > Generally I'd be grateful for tips regarding larger spoken =20
    > dialogue corpora (task-oriented dialogue) that have been =20
    > syntactically annotated.



    This archive was generated by hypermail 2b29 : Thu Jul 14 2005 - 16:35:37 MET DST