Re: Corpora: Corpus Linguistics Methodologies. Was: Corpus

Mike Scott (Mike.Scott@liverpool.ac.uk)
Tue, 04 Aug 1998 11:55:52 +0100

A word in relation to Marc's point, the general thrust of which I share.

WordSmith Tools USED to use only chi-square, then incorporated Ted
Dunning's Log Likelihood routine (with help from Ted) as the default, with
chi-square still present as an option. Actually both procedures (as
implemented in WordSmith Tools) are inherently problematic, since they
compute the probability for each word-type to have occurred, as opposed to
the probability that the whole set of types in a given list is in some
sense outstanding. But what makes something "flawed" is mis-applying it. It
is very easy to do that, as for example when one starts rushing into factor
analysis in SPSS. Tools can be dangerous.

An example of what I understood Oliver & Ylva to be on about:
a PhD student of mine recently wanted to be able to examine concordance
output starting only with the node item and one item to the left, then
gradually reveal more context, so as to be able to study lexical inference
procedures. He needs to see only the one concordance line at a time, and in
a large font. No tool that I'm aware of offers this facility. To make this
sort of thing available might take a linguist-programmer about the same
amount of time that it takes a linguist-non-programmer to specify to a
programmer exactly what is needed and liaise with the programmer. And in
ordinary operating systems (like Windows) it can be done at home. I do not
think all linguists SHOULD learn to do this, any more than they SHOULD
learn to ski; merely that some of us might WANT to.

Best wishes -- Mike
******************************************
Mike Scott
Applied English Language Studies Unit
University of Liverpool, Liverpool L69 3BX
http://www.liv.ac.uk/~ms2928/homepage.html
http://www.liv.ac.uk/~ms2928/wordsmit.htm