Corpora: Oscar from Spain

Oscar.Martin@loctite-europe.com
Thu, 30 Oct 1997 13:54:34 +0100

Hi,
first, I?m sorry for my english. I have a question to you. Do you know
which language is more rich in vocabulary, Spanish or English. It?s a
little bet with a few jobmates. I wrote an e-mail to Bob Krovetz and he
gave me your address. Please, help us.
What is the difference between the "voices" and the "words" that has
a language, for example, Spanish has 85.300 "voices" in its official
dictionary but It has about 3.000.000 words, could you explain me the
difference, Thanks a lot for your time.......

I send you Bob's e-mail:....

From: Bob Krovetz <krovetz@research.nj.nec.com>
to:mar0074@ibm.net

You could address your message to the corpora list: corpora@lists.uib.no
It is a difficult question to answer though - how do you count? What do
you do about predictable morphological variants - does that count as a
new word? What about a word that has an embedded space (such as "white
house",or "operating system"). Do they count as one word or two? What
about
proper nouns? If you don't want to count them, how are you going to
exclude
them? I think you can see some of the problems in answering your
question.
Spanish clearly has a richer morphological system than English, and the
vocabulary would therefore have a greater proportion of morphological
variants.
But I *think* English borrows words from other languages more freely than
does Spanish.

Bob