Re: [Corpora-List] Query on the use of Google for corpus research

From: Chris Jordan (cjordan@cs.dal.ca)
Date: Fri May 27 2005 - 15:05:22 MET DST

  • Next message: Dominic Widdows: "Re: [Corpora-List] Query on the use of Google for corpus research"

    Oops,

    typ-o in the URL of my last. Sorry about that.

    http://gatekeeper.dec.com/pub/DEC/SRC/technical-notes/abstracts/src-tn-1998-014.html

    Chris Jordan wrote:

    > Hello,
    >
    > I would recommend looking at the following reference as it is highly
    > related:
    > Craig Silverstein, Monika Henzinger, Hannes Marais, and Michael
    > Moriez. Analysis of a very large Altavista Query Log. Technical Report
    > 1998-014, Digital SRC, 1998.
    > http://gatekeeper.dec.com/pub/DEC/SRC/technicalnotes/abstracts/src-tn-1998-014.html
    >
    >
    > There are some interesting issues with regard to examining such data.
    > The first that really comes to mind is that you have to be able to
    > distinguish between search sessions. This is non-trivial as users
    > typically do not have a single goal when searching; there is some work
    > by Spink on this topic. Both gathering this query data at the client
    > side and at the server side have their own set of problems.
    >
    > As statistics are being gathered, it is important to discuss
    > properties of the user group (sample population) being evaluated.
    > Depending on the diversity of the sample (or lack of it) will
    > determine what kind of conclusions can be made.
    >
    > Hope that helps,
    >
    > Chris
    >
    > Peter K Tan wrote:
    >
    >> Just forwarding a question from a colleague. Would be grateful for
    >> comments.
    >>
    >> Cheers,
    >> Peter
    >>
    >> From: Michelle Maria Lazar
    >> Sent: 27 May 2005 11.27
    >> To: Peter K W Tan; Talib, I S; Vincent Ooi; Wee Hock Ann, Lionel
    >> Subject: Query on the use of Google for corpus research
    >>
    >> Hi all,
    >> Someone has written to ask me whether there's any foreseeable
    >> problem/objection in using Google to gather statistical evidence
    >> on particular language usage, using key word searches. It involves
    >> a submission of an article currently under review. Does anyone
    >> have any experience/insight on this?
    >>
    >> Cheers,
    >>
    >> Michelle
    >>
    >



    This archive was generated by hypermail 2b29 : Fri May 27 2005 - 15:09:11 MET DST