RE: [Corpora-List] Query about nomenclature

From: Andrew Kehoe (Andrew.Kehoe@uce.ac.uk)
Date: Fri Mar 11 2005 - 16:33:01 MET

  • Next message: Piao, Songlin: "RE: [Corpora-List] fast string replacement"

    John

    You need to use the search term "ngram -perl" rather than "ngram not
    perl" because, as Stefan Evert pointed out, "ngram not perl" just
    returns pages containing all 3 of those words.

    Another problem with your method is that Google ignores hyphens in
    search terms. One of the pages returned for the term "n-gram" is
    http://cpan.dei.uc.pt/authors/id/J/JH/JHI/ngram.pl-1.48&e=8092 but this
    page does not contain the word "n-gram" at all, only "ngram" without the
    hyphen.

    Andrew Kehoe
    Research and Development Unit for English Studies
    School of English
    University of Central England, Birmingham
    http://rdues.uce.ac.uk/
     
    http://www.webcorp.org.uk/

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of John F. Sowa
    Sent: 10 March 2005 01:43
    To: Damon Allen Davison
    Cc: John Mckenny; CORPORA@HD.UIB.NO
    Subject: Re: [Corpora-List] Query about nomenclature

    Damon Davison's use of Google inspired me to try
    a variation. I just typed three queries and
    got the following number of hits:

    Search string Hits
    ------------- ------
    ngram 21,100

    ngram not perl 540

    n-gram 85,700

    This seems to provide overwhelming evidence for
    a hyphen between "n" and "gram". Since Google
    doesn't distinguish capitals, that leaves the
    capitalization question unresolved.

    John Sowa



    This archive was generated by hypermail 2b29 : Fri Mar 11 2005 - 17:30:03 MET