Re: Corpora: T-score in collocational analysis

Gordon and Pam Cain (gpcain@rivernet.com.au)
Thu, 09 Dec 1999 21:38:17 +1100

Przemyslaw--

Przemyslaw Kaszubski wrote:
>
> Regards to to all the subscribers,
>
> Two questions:
>
> 1. Can anyone explain (or point to a Web source or otherwise easily available source apart from the Church, K.W,, W. Gale, P. Hanks & D. Hindle "Using Statistics in
> Lexical Analysis" in <italic>Lexical Acquisition: Using On-Line
> Resources to Build a Lexicon</italic>. Ed. Uri Zernik. Hillsdale:
> Lawrence Erlbaum, 1991)
> the use of the t-score statistic in collocation retrieval? I mean the
> one used by Cobuild. How does the formula work? I am familiar with
> MI and Z-scores but the t-score seems to be
> in use only in the CobuildDirect service.
>

Try Jeremy Clear's explanation from the Cobuild site of the T-score (and
the MI I think). The address I gave it in my biblio is:

Clear, J 1995, ‘COBUILD Bank of English explanation of stats'. Collins
COBUILD Collocation Concordancer
http://titania.cobuild.collins.co.uk/form.html
(accessed 24th April, 1999).
It's the most clear and accessible that I've found.

Church and Hanks also wrote:
Church, KW, and P Hanks 1990, ‘Word association norms, mutual
information, and lexicography', Computational Linguistics vol 16, no 1
(March 1990), 22-29

You might also try:
Godby, J 1994(?), ‘Two techniques for the identification of phrases in
full text'
http://www.oclc.org/oclc/research/publications/review94/part1/twotech.htm
(Accessed 15th July, 1998).

I don't remember much about it, but think it was related.

> 2. Do you know of corpus analysis
> packages available for researchers that employ this t-score?

Am attaching part of a posting by Oliver Mason from earlier this year --
I think it uses seven(!) different scores for collocations, and was
developed by the Cobuild lot, so I'm sure it would offer the T-score!

Oliver Mason wrote:
. . .I am pleased to announce the release of a corpus browser called
`Qwick', which is now available for download from our website at

http://www.clg.bham.ac.uk/QWICK/index.html.

Qwick allows you to
construct a working corpus from a set of corpora available on the
computer, retrieve concordance lines from this using a simple but
powerful query language, and to compute collocations with a variety of
adjustable parameter settings.

Qwick is implemented in Java and thus is fully platform independent; it
has been extensively tested on Windows and Solaris. . .

>
> I do small corpus research and I am basically after a tool with a statistic that does not favour rare words as much as the MI does. So far TACT's z-scores seem the best option.
>
> Przemek Kaszubski
> ========================================== Przemyslaw Kaszubski, M.A. przemka@amu.edu.pl http://elex.amu.edu.pl/ifa/staff/kaszubski.html
> MY (ENGLISH) (LEARNER) CORPORA PAGE: http://main.amu.edu.pl/~przemka
> School of English Adam Mickiewicz University Al. Niepodleglosci 4 61-874 Poznan, POLAND tel: +48 61 8528820 fax: +48 61 8523103 =========================================

--
Gordon Cain 
Teacher of ESOL
TAFE International Education Centre
Liverpool (Sydney) Australia
gpcain@rivernet.com.au