RE: [Corpora-List] Re: problems with Google

From: Deane, Paul (pdeane@ets.org)
Date: Thu Mar 17 2005 - 16:39:56 MET

  • Next message: Andrew Kehoe: "RE: [Corpora-List] Re: problems with Google"

    Has anybody checked whether the behavior with Google's Web API and its
    standard search is different?
     
    I have code using the Java Web API which makes use of the asterisk to blank
    out a single word (not an unrestricted wildcard.) As of yesterday, when I
    tested the code, it still appeared to be working as designed.

    -----Original Message-----
    From: Andrew Kehoe [mailto:Andrew.Kehoe@uce.ac.uk]
    Sent: Thursday, March 17, 2005 9:27 AM
    To: CORPORA@uib.no
    Subject: RE: [Corpora-List] Re: problems with Google

    John
     
    Even if you put double quotes around the wildcard character Google will
    ignore it. When you search for:
     
    "what does "*" mean"
     
    Google is actually searching for 2 'phrases': "what does " and " mean". You
    cannot nest double quotes in Google so the double quotes around the * are
    actually closing your initial quote and beginning a new quote, with the
    wildcard ignored completely.
     
    It may be the case that SOME of the pages Google returns will contain "what
    does", followed by one other word, followed by "mean" but your query does
    not ask for this specifically. Google could (and does) also return pages
    containing "mean" and "what does" in the opposite order, or with multiple
    words in between.
     
    Similarly, "what does "*" "*" mean" is actually searching for 3 'phrases':
    1) "what does ", 2) " " (a space), and 3)" mean".
     
    So, Google hasn't retained support for wildcards at all I'm afraid, and this
    is why we are developing our own search engine in WebCorp, as Antoinette
    Renouf mentioned yesterday.
     
    Andrew Kehoe
    Research and Development Unit for English Studies
    Univerity of Central England in Birmingham
     
    http://www.webcorp.org.uk/ <http://www.webcorp.org.uk/>

    -----Original Message-----
    From: owner-corpora@lists.uib.no on behalf of John Milton
    Sent: Thu 17/03/2005 13:39
    To: CORPORA@uib.no
    Cc:
    Subject: [Corpora-List] Re: problems with Google

    I just discovered that Google seems to have retained some use of the
    wildcard for words if you use double quotes with the asterisk. A search
    for "what does "*" mean" and "what does "*" "*" mean" results MAINLY in
    any one and two words respectively. If anyone else is using web searches
    as language learning/teaching resources, this also looks promising:
    http://www.findforward.com/ <http://www.findforward.com/>

    John Milton
    Hong Kong University of Science & Technology

    **************************************************************************
    This e-mail and any files transmitted with it may contain privileged or
    confidential information. It is solely for use by the individual for whom
    it is intended, even if addressed incorrectly. If you received this e-mail
    in error, please notify the sender; do not disclose, copy, distribute, or
    take any action in reliance on the contents of this information; and delete
    it from your system. Any other use of this e-mail is prohibited. Thank you
    for your compliance.



    This archive was generated by hypermail 2b29 : Thu Mar 17 2005 - 16:49:46 MET