Re: [Corpora-List] Query on the use of Google for corpus research

From: Alexander Schutz (goalscoringsuperstarhero@gmail.com)
Date: Wed Jun 01 2005 - 00:38:24 MET DST

  • Next message: Marco Baroni: "Re: [Corpora-List] Query on the use of Google for corpus research"

    I see your point in everything you are saying in case you really
    (and desperately) want to compile this billions of words corpus
    from the web.
    But then again, why not go simply to UPenn and purchase some
    license for English Gigaword plus some additional tens of millions
    words corpora from LDC? It's all nicely marked up and you don't
    have to mess with all those crawling and postprocessing problems
    at all, not to mention storage.

    Cheers,
    Alex

    On 5/31/05, Marco Baroni <baroni@sslmit.unibo.it> wrote:
    > In my experience, adding and changing samples indefinitely until I have
    > about 1 billion words of web-data with the characteristics I need turns
    > out to be a pretty difficult thing to do... if you can suggest a procedure
    > to do this in an easy way, I (and, I suspect, "most corpus linguists")
    > would be very grateful.
    >



    This archive was generated by hypermail 2b29 : Wed Jun 01 2005 - 00:55:49 MET DST