Re: [Corpora-List] Communicator corpora parsed?

From: David Reitter (david.reitter@gmail.com)
Date: Fri Jul 15 2005 - 15:43:29 MET DST

  • Next message: Yuri Tambovtsev: "[Corpora-List] any opinion or argument on Language, Vol. 81, no. 2 (June, 2005"

    I received two replies to my earlier question regarding the
    availability of syntactic annotations of the DARPA Communicator
    corpus and of other spoken dialogue corpora.
    Both Sandra Kübler at Tübingen and Detmar Meurers at Ohio State
    recommended the Verbmobil treebanks, which contain spoken dialogue in
    German, English and Japanese. They are available via

    http://www.phonetik.uni-muenchen.de/Bas/BasHomeeng.html

    A newer version of the German treebank is in preparation.

    As a side note: many (if not most) of the non-canned, spontaneous
    speech in Communicator consists of very short utterances. In
    contrast, the Maptask corpus (developed here at HCRC, Edinburgh;
    spoken human-human dialogue) has a lot to offer in terms of syntax

    Thanks for the replies.

    > is anyone aware of syntactic annotations of the (e.g. DARPA)
    > Communicator corpus, or similar large, task-oriented human/machine
    > or human/human dialogue corpora?
    > I'm looking for tree structures, and atomic categories such as VP
    > or PP would do just fine. I could work with non-perfect (i.e.
    > machine- parsed) annotations.
    > Generally I'd be grateful for tips regarding larger spoken
    > dialogue corpora (task-oriented dialogue) that have been
    > syntactically annotated.
    >

    --
    David Reitter - ICCS/HCRC, Informatics, University of Edinburgh
    



    This archive was generated by hypermail 2b29 : Fri Jul 15 2005 - 16:10:26 MET DST