RE: [Corpora-List] Query about nomenclature

From: Andrew Kehoe (Andrew.Kehoe@uce.ac.uk)
Date: Fri Mar 11 2005 - 21:07:43 MET

  • Next message: celle: "[Corpora-List] conference announcement"

    John Sowa's original queries were
     
    1) ngram
    2) ngram not perl
    3) n-gram
     
    To get more accuate results, these should be run as
     
    1) ngram
    2) ngram -perl
    3) "n-gram" (to force Google to match only 'n-gram' with a hyphen)
     
    It is not necessary to run
     
    "n-gram" -perl
     
    because (as Damon Allen Davison said) the Perl module we want to filter out of the results is called Text::Ngram not Text::N-gram.
     
    Andrew Kehoe
    Research and Development Unit for English Studies
    School of English
    University of Central England, Birmingham
    http://rdues.uce.ac.uk/ <http://rdues.uce.ac.uk/>

    http://www.webcorp.org.uk/ <http://www.webcorp.org.uk/>
     
     
    -----Original Message-----
    From: owner-corpora@lists.uib.no on behalf of Normunds Gruzitis
    Sent: Fri 11/03/2005 17:53
    To: CORPORA@HD.UIB.NO
    Cc:
    Subject: RE: [Corpora-List] Query about nomenclature



            Did you put "n-gram" in quotes in your search query?
            Google's response to me: "Results 1 - 10 of about 63,600 for
            "n-gram" -perl."
            
            Regards,
            Normunds
            
            
            -----Original Message-----
            From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no]On
            Behalf Of Andrew Kehoe
            Sent: Friday, March 11, 2005 5:33 PM
            To: John F. Sowa
            Cc: CORPORA@HD.UIB.NO
            Subject: RE: [Corpora-List] Query about nomenclature
            
            
            John
            
            You need to use the search term "ngram -perl" rather than "ngram not
            perl" because, as Stefan Evert pointed out, "ngram not perl" just
            returns pages containing all 3 of those words.
            
            Another problem with your method is that Google ignores hyphens in
            search terms. One of the pages returned for the term "n-gram" is
            http://cpan.dei.uc.pt/authors/id/J/JH/JHI/ngram.pl-1.48&e=8092 but this
            page does not contain the word "n-gram" at all, only "ngram" without the
            hyphen.
            
            Andrew Kehoe
            Research and Development Unit for English Studies
            School of English
            University of Central England, Birmingham
            http://rdues.uce.ac.uk/
            
            http://www.webcorp.org.uk/
            
            -----Original Message-----
            From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
            Behalf Of John F. Sowa
            Sent: 10 March 2005 01:43
            To: Damon Allen Davison
            Cc: John Mckenny; CORPORA@HD.UIB.NO
            Subject: Re: [Corpora-List] Query about nomenclature
            
            Damon Davison's use of Google inspired me to try
            a variation. I just typed three queries and
            got the following number of hits:
            
            Search string Hits
            ------------- ------
            ngram 21,100
            
            ngram not perl 540
            
            n-gram 85,700
            
            This seems to provide overwhelming evidence for
            a hyphen between "n" and "gram". Since Google
            doesn't distinguish capitals, that leaves the
            capitalization question unresolved.
            
            John Sowa
            
            
            
            
            
            
            
            
            
            
            
            
            



    This archive was generated by hypermail 2b29 : Fri Mar 11 2005 - 21:19:20 MET