[Corpora-List] Re: problems with Google counts

From: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE (jfidel@siu.buap.mx)
Date: Thu Mar 17 2005 - 03:25:50 MET

  • Next message: John Milton: "[Corpora-List] Re: problems with Google"

    Hi, Corpora Guys,

    Sorry I don't remember who wrote suggesting simply repeating the word in
    Google to get a supposedly more realistic count of pages with the word in it
    (I had deleted all those messages after reading them). I tried this
    yesterday on a couple of Spanish words (eficaz, eficiente). (By the way,
    the results were apparently consonant with a student's search of the
    100,000,000 word corpusdelespañol site.) Anyway, what repeating the word
    apparently does is limit the results to those sites which have the word at
    least two times, in this case cutting down on the numbers by roughly 10%.
    If that is what is happening, this implies serious problems for relatively
    rare words, which may not occur twice in very many pages at all. At any
    rate, the decrease in pages encountered seemed to be about the same
    proportionately in both cases. (We're talking here about roughly 1.5M
    original hits.) If I'm missing the point of the suggestion, please
    straighten me out.

    Jim

    James L. Fidelholtz
    Posgrado en Ciencias del Lenguaje, ICSyH
    Benemérita Universidad Autónoma de Puebla MÉXICO



    This archive was generated by hypermail 2b29 : Thu Mar 17 2005 - 03:24:21 MET