Re: [Corpora-List] Re: transcribing video corpora

From: Martin Wynne (martin.wynne@oucs.ox.ac.uk)
Date: Wed Nov 15 2006 - 16:16:53 MET

  • Next message: David Reitter: "Re: [Corpora-List] transcribing video corpora"

    Saying that the CHILDES system is "basically ASCII" doesn't tell much of
    the story (as well as begging lots of questions about non-English texts,
    Unicode compliance, etc....).

    I think anyone considering this route should think very carefully. Using
    an annotation system such as CHAT, which is not conformant to open
    standards, and which requires specific software to use the texts, can
    mean that the usability of the data is very restricted. Some of the
    software is open source and available under a GNU licence, but not all,
    as far as I can see. CHAT is a de facto standard for a few communities
    of linguists, but not for the vast majority of researchers who might
    want to use language resources, and is not even widely known in
    mainstream corpus linguistics. CHAT-encoded texts cannot easily be used
    with generic software that deal with texts or the other data streams in
    multimedia data. To put it simply, with XML files you can use style
    sheets, web browsers, and more sophisticated programs made available via
    web services, and with CHAT files you can't. Finally, the reliance on
    the CHAT software means that it is not a format which is appropriate for
    the long-term preservation of the data.

    While using the CHILDES transcription system may appear to be a viable
    route for data development because of the current availability of tools,
    guidelines and a lively user community, choosing this route will block
    the majority of potenially interested researchers from using the data,
    and restrict the ways in which it can be exploited. Unless there now
    exist some migration tools from CHAT to a sensible form of XML (or some
    more standards system), I wouldn't recommend this route. Can anyone
    shed more light on the migration facilities?

    Martin

    James_L._Fidelholtz wrote:
    > Alex Boulton escribió:
    > <<Does anyone know of any free tools which help with transcription of
    > video corpora? What we would ideally like would be a kind of video
    > version of Transcriber (WinPitch is a bit complicated for our needs),
    > ie which allows multimedia alignment of transcription, sound & video,
    > plus the usual tools for annotating etc.>>
    > Hi, Alex,
    > Check out the CHILDES site. They have all sorts of transcription aids,
    > as well as analysis tools (as long as you transcribe in their system,
    > which is basically ASCII (I'm sure it's updated by now to permit
    > slightly more 'elegant' transcriptions, ie ANSI). The tools are quite
    > useful. (Child Language Data Exchange System: childes.psy.cmu.edu/)
    > Don't let the 'child language' label fool you: the system is quite
    > versatile and general (and, of course, includes video capabilities of
    > the sort you are looking for). Now, I have not entered deeply into
    > this system, being an oldie but fogey, but my wife (who works on child
    > language) uses it and swears by it.
    > Jim
    > James L. Fidelholtz
    > Posgrado en Ciencias del Lenguaje, ICSyH
    > Benemérita Universidad Autónoma de Puebla MÉXICO
    >
    >

    -- 
    Martin Wynne
    Head of the Oxford Text Archive and
    AHDS Literature, Languages and Linguistics
    

    Oxford University Computing Services 13 Banbury Road Oxford UK - OX2 6NN Tel: +44 1865 283299 Fax: +44 1865 273275 martin.wynne@oucs.ox.ac.uk



    This archive was generated by hypermail 2b29 : Wed Nov 15 2006 - 16:14:26 MET