RE: CHILDES (Was: RE: [Corpora-List] Re: transcribing video corpora)

From: Hardie, Andrew (a.hardie@lancaster.ac.uk)
Date: Thu Nov 16 2006 - 14:55:45 MET

Next message: Santos Diana: "[Corpora-List] RE: [Corpora-List] Parallel corpora and word alignment, WAS: American and British English spelling converter"

Previous message: Hong Huaqing: "RE: [Corpora-List] transcribing video corpora"
In reply to: hkaalep: "CHILDES (Was: RE: [Corpora-List] Re: transcribing video corpora)"
Next in thread: Herr Herrner: "Re: [Corpora-List] transcribing video corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

In theory it is possible to map out of CHAT to XML, run your favourite XML-compliant morphological analyser, use a script or stylesheet to adjust the annotation format back to the CHAT schema, then map from XML back into CHAT. I don't know if anyone's ever actually done this, however!

Andrew.

-----Original Message-----
From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On Behalf Of hkaalep
Sent: 16 November 2006 10:47
To: 'Martin Wynne'
Cc: corpora@uib.no
Subject: CHILDES (Was: RE: [Corpora-List] Re: transcribing video corpora)

I wonder if choosing CHILDES will lead you to a dead end?

Recently a colleague of mine, who is working on Estonian, wanted to use a
morphological analyser on her texts. The CHILDES way of doing it would be to
use MOR in CLAN (Computerized Language Analysis). However,
http://childes.psy.cmu.edu/manuals/CLAN.pdf states on page 113:

"5.37 MOR
The MOR program is used to generate a %mor tier for all main tiers in a CHAT
file.
Successful use of MOR requires a full understanding of the operation of the
program, the process of lexicon building, and the use of methods for
improving the morphological analysis. MOR is a complex program that is
intended for the serious user who is willing to commit a large amount of
time and effort in order to achieve a major improvement in analytic
capabilities."

I am not familiar with CHILDES, but it looks like it is impossible to plug
in an existing third party program for your purpose.
So you are left with two options: become an expert in building a
morpohological analyser in CLAN, or abandon CHILDES.

Or am I missing something here?

Heiki Kaalep,
Univ. of Tartu, Estonia

-----Original Message-----
From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
Behalf Of Martin Wynne
Sent: Wednesday, November 15, 2006 5:17 PM
To: James_L._Fidelholtz
Cc: Alex Boulton; corpora@uib.no
Subject: Re: [Corpora-List] Re: transcribing video corpora

Saying that the CHILDES system is "basically ASCII" doesn't tell much of the
story (as well as begging lots of questions about non-English texts, Unicode
compliance, etc....).

I think anyone considering this route should think very carefully. Using an
annotation system such as CHAT, which is not conformant to open standards,
and which requires specific software to use the texts, can mean that the
usability of the data is very restricted. Some of the software is open
source and available under a GNU licence, but not all, as far as I can see.
CHAT is a de facto standard for a few communities of linguists, but not for
the vast majority of researchers who might want to use language resources,
and is not even widely known in mainstream corpus linguistics. CHAT-encoded
texts cannot easily be used with generic software that deal with texts or
the other data streams in multimedia data. To put it simply, with XML files
you can use style sheets, web browsers, and more sophisticated programs made
available via web services, and with CHAT files you can't. Finally, the
reliance on the CHAT software means that it is not a format which is
appropriate for the long-term preservation of the data.

While using the CHILDES transcription system may appear to be a viable route
for data development because of the current availability of tools,
guidelines and a lively user community, choosing this route will block the
majority of potenially interested researchers from using the data, and
restrict the ways in which it can be exploited. Unless there now exist some
migration tools from CHAT to a sensible form of XML (or some more standards
system), I wouldn't recommend this route. Can anyone shed more light on the
migration facilities?

Martin

James_L._Fidelholtz wrote:
> Alex Boulton escribió:
> <<Does anyone know of any free tools which help with transcription of
> video corpora? What we would ideally like would be a kind of video
> version of Transcriber (WinPitch is a bit complicated for our needs),
> ie which allows multimedia alignment of transcription, sound & video,
> plus the usual tools for annotating etc.>>
> Hi, Alex,
> Check out the CHILDES site. They have all sorts of transcription aids,
> as well as analysis tools (as long as you transcribe in their system,
> which is basically ASCII (I'm sure it's updated by now to permit
> slightly more 'elegant' transcriptions, ie ANSI). The tools are quite
> useful. (Child Language Data Exchange System: childes.psy.cmu.edu/)
> Don't let the 'child language' label fool you: the system is quite
> versatile and general (and, of course, includes video capabilities of
> the sort you are looking for). Now, I have not entered deeply into
> this system, being an oldie but fogey, but my wife (who works on child
> language) uses it and swears by it.
> Jim
> James L. Fidelholtz
> Posgrado en Ciencias del Lenguaje, ICSyH
> Benemérita Universidad Autónoma de Puebla MÉXICO
>
>

-- Martin Wynne Head of the Oxford Text Archive and AHDS Literature, Languages and Linguistics

Oxford University Computing Services 13 Banbury Road Oxford UK - OX2 6NN Tel: +44 1865 283299 Fax: +44 1865 273275 martin.wynne@oucs.ox.ac.uk

Next message: Santos Diana: "[Corpora-List] RE: [Corpora-List] Parallel corpora and word alignment, WAS: American and British English spelling converter"
Previous message: Hong Huaqing: "RE: [Corpora-List] transcribing video corpora"
In reply to: hkaalep: "CHILDES (Was: RE: [Corpora-List] Re: transcribing video corpora)"
Next in thread: Herr Herrner: "Re: [Corpora-List] transcribing video corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Nov 16 2006 - 15:03:50 MET