Re: [Corpora-List] Query on the use of Google for corpus research

From: Alexander Schutz (goalscoringsuperstarhero@gmail.com)
Date: Wed Jun 01 2005 - 00:38:24 MET DST

Next message: Marco Baroni: "Re: [Corpora-List] Query on the use of Google for corpus research"

Previous message: Mark P. Line: "Re: [Corpora-List] Query on the use of Google for corpus research"
In reply to: Marco Baroni: "Re: [Corpora-List] Query on the use of Google for corpus research"
Next in thread: Marco Baroni: "Re: [Corpora-List] Query on the use of Google for corpus research"
Next in thread: Tom Emerson: "Re: [Corpora-List] Query on the use of Google for corpus research"
Reply: Marco Baroni: "Re: [Corpora-List] Query on the use of Google for corpus research"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I see your point in everything you are saying in case you really
(and desperately) want to compile this billions of words corpus
from the web.
But then again, why not go simply to UPenn and purchase some
license for English Gigaword plus some additional tens of millions
words corpora from LDC? It's all nicely marked up and you don't
have to mess with all those crawling and postprocessing problems
at all, not to mention storage.

Cheers,
Alex

On 5/31/05, Marco Baroni <baroni@sslmit.unibo.it> wrote:
> In my experience, adding and changing samples indefinitely until I have
> about 1 billion words of web-data with the characteristics I need turns
> out to be a pretty difficult thing to do... if you can suggest a procedure
> to do this in an easy way, I (and, I suspect, "most corpus linguists")
> would be very grateful.
>

Next message: Marco Baroni: "Re: [Corpora-List] Query on the use of Google for corpus research"
Previous message: Mark P. Line: "Re: [Corpora-List] Query on the use of Google for corpus research"
In reply to: Marco Baroni: "Re: [Corpora-List] Query on the use of Google for corpus research"
Next in thread: Marco Baroni: "Re: [Corpora-List] Query on the use of Google for corpus research"
Next in thread: Tom Emerson: "Re: [Corpora-List] Query on the use of Google for corpus research"
Reply: Marco Baroni: "Re: [Corpora-List] Query on the use of Google for corpus research"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Jun 01 2005 - 00:55:49 MET DST