Re: [Corpora-List] Query on the use of Google for corpus research

From: Chris Jordan (cjordan@cs.dal.ca)
Date: Fri May 27 2005 - 14:14:09 MET DST

  • Next message: Chris Jordan: "Re: [Corpora-List] Query on the use of Google for corpus research"

    Hello,

    I would recommend looking at the following reference as it is highly
    related:
    Craig Silverstein, Monika Henzinger, Hannes Marais, and Michael Moriez.
    Analysis of a very large Altavista Query Log. Technical Report 1998-014,
    Digital SRC, 1998.
    http://gatekeeper.dec.com/pub/DEC/SRC/technicalnotes/abstracts/src-tn-1998-014.html

    There are some interesting issues with regard to examining such data.
    The first that really comes to mind is that you have to be able to
    distinguish between search sessions. This is non-trivial as users
    typically do not have a single goal when searching; there is some work
    by Spink on this topic. Both gathering this query data at the client
    side and at the server side have their own set of problems.

    As statistics are being gathered, it is important to discuss properties
    of the user group (sample population) being evaluated. Depending on the
    diversity of the sample (or lack of it) will determine what kind of
    conclusions can be made.

    Hope that helps,

    Chris

    Peter K Tan wrote:

    > Just forwarding a question from a colleague. Would be grateful for
    > comments.
    >
    > Cheers,
    > Peter
    >
    > From: Michelle Maria Lazar
    > Sent: 27 May 2005 11.27
    > To: Peter K W Tan; Talib, I S; Vincent Ooi; Wee Hock Ann, Lionel
    > Subject: Query on the use of Google for corpus research
    >
    > Hi all,
    >
    > Someone has written to ask me whether there's any foreseeable
    > problem/objection in using Google to gather statistical evidence
    > on particular language usage, using key word searches. It involves
    > a submission of an article currently under review. Does anyone
    > have any experience/insight on this?
    >
    > Cheers,
    >
    > Michelle
    >



    This archive was generated by hypermail 2b29 : Fri May 27 2005 - 14:57:18 MET DST