RE: [Corpora-List] Re: problems with Google

From: Marian Olteanu (mou_softwin@yahoo.com)
Date: Fri Mar 18 2005 - 08:47:13 MET

  • Next message: Hiroshi Nakagawa: "[Corpora-List] 2nd CFP of IJCNLP"

    Well, the results in Google API were ALWAYS a little bit (or not quite a little) different than
    those reported by http://www.google.com/ . You will see a different order for the results, and a
    small (or big) difference in counts - what we are interested in.
    --- "Deane, Paul" <pdeane@ets.org> wrote:

    > Has anybody checked whether the behavior with Google's Web API and its
    > standard search is different?
    >
    > I have code using the Java Web API which makes use of the asterisk to blank
    > out a single word (not an unrestricted wildcard.) As of yesterday, when I
    > tested the code, it still appeared to be working as designed.
    >
    > -----Original Message-----
    > From: Andrew Kehoe [mailto:Andrew.Kehoe@uce.ac.uk]
    > Sent: Thursday, March 17, 2005 9:27 AM
    > To: CORPORA@uib.no
    > Subject: RE: [Corpora-List] Re: problems with Google
    >
    >
    >
    > John
    >
    > Even if you put double quotes around the wildcard character Google will
    > ignore it. When you search for:
    >
    > "what does "*" mean"
    >
    > Google is actually searching for 2 'phrases': "what does " and " mean". You
    > cannot nest double quotes in Google so the double quotes around the * are
    > actually closing your initial quote and beginning a new quote, with the
    > wildcard ignored completely.
    >
    > It may be the case that SOME of the pages Google returns will contain "what
    > does", followed by one other word, followed by "mean" but your query does
    > not ask for this specifically. Google could (and does) also return pages
    > containing "mean" and "what does" in the opposite order, or with multiple
    > words in between.
    >
    > Similarly, "what does "*" "*" mean" is actually searching for 3 'phrases':
    > 1) "what does ", 2) " " (a space), and 3)" mean".
    >
    > So, Google hasn't retained support for wildcards at all I'm afraid, and this
    > is why we are developing our own search engine in WebCorp, as Antoinette
    > Renouf mentioned yesterday.
    >
    > Andrew Kehoe
    > Research and Development Unit for English Studies
    > Univerity of Central England in Birmingham
    >
    > http://www.webcorp.org.uk/ <http://www.webcorp.org.uk/>
    >
    > -----Original Message-----
    > From: owner-corpora@lists.uib.no on behalf of John Milton
    > Sent: Thu 17/03/2005 13:39
    > To: CORPORA@uib.no
    > Cc:
    > Subject: [Corpora-List] Re: problems with Google
    >
    >
    >
    > I just discovered that Google seems to have retained some use of the
    > wildcard for words if you use double quotes with the asterisk. A search
    > for "what does "*" mean" and "what does "*" "*" mean" results MAINLY in
    > any one and two words respectively. If anyone else is using web searches
    > as language learning/teaching resources, this also looks promising:
    > http://www.findforward.com/ <http://www.findforward.com/>
    >
    > John Milton
    > Hong Kong University of Science & Technology
    >
    >
    >
    >
    >
    >
    >
    >
    > **************************************************************************
    > This e-mail and any files transmitted with it may contain privileged or
    > confidential information. It is solely for use by the individual for whom
    > it is intended, even if addressed incorrectly. If you received this e-mail
    > in error, please notify the sender; do not disclose, copy, distribute, or
    > take any action in reliance on the contents of this information; and delete
    > it from your system. Any other use of this e-mail is prohibited. Thank you
    > for your compliance.
    >
    >
    >
    >

    Marian
    http://www.utdallas.edu/~mgo031000/

                    
    __________________________________
    Do you Yahoo!?
    Yahoo! Small Business - Try our new resources site!
    http://smallbusiness.yahoo.com/resources/



    This archive was generated by hypermail 2b29 : Fri Mar 18 2005 - 08:56:47 MET