Re: Corpora: wordcounts

Michael Barlow (barlow@ruf.rice.edu)
Wed, 14 Apr 1999 21:59:05 -0500 (CDT)

I take it that David Carlson's second experiment shows that we are not
dealing with a difference in case-sensitivity and so I have to assume that
the use of all-caps in the Wordsmith wordlist and the use of lowercase in
the MonoConc wordlist is incidental. (But let me remind members of the
list who might be playing with this that in MonoConc the search and
frequency functions are governed by different case-sensivity settings.)

MonoConc allows the user to determine what characters count as
word-delimiters and the default setting consists of the following:
.,;:#$^&()[]{}<>+=/\|`~"

MonoConc Pro is the same apart from the addition of !

The user can add or remove characters, but as a default list, this is
minimal.

I had a quick look at Wordsmith, which I use only occasionally and don't
know at all well, but I could not see an equivalent list among the many
settings possibilities. However, I did see an option for including or
excluding hyphens as part of a word---and hyphens were not in the MonoConc
default list, and neither was the apostrophe.

I don't know if this is enough to explain the considerably lower word
frequency counts in MonoConc.

Michael
----------------------------------------------------------------------
Michael Barlow, Department of Linguistics, Rice University
barlow@rice.edu www.ruf.rice.edu/~barlow
Athelstan barlow@athel.com www.athel.com (U.S.) www.athelstan.com (UK)