Re: [Corpora-List] Query on the use of Google for corpus research

From: Marco Baroni (baroni@sslmit.unibo.it)
Date: Wed Jun 01 2005 - 00:56:13 MET DST

Next message: Marco Baroni: "[Corpora-List] web-corpora, big and small"

Previous message: Alexander Schutz: "Re: [Corpora-List] Query on the use of Google for corpus research"
In reply to: Alexander Schutz: "Re: [Corpora-List] Query on the use of Google for corpus research"
Next in thread: Tom Emerson: "Re: [Corpora-List] Query on the use of Google for corpus research"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> But then again, why not go simply to UPenn and purchase some
> license for English Gigaword plus some additional tens of millions
> words corpora from LDC?

For example because I'm also interested in 1 billion words of Italian,
German and Japanese? Or because I think that the web can give us a more
varied picture of a language than a newswire corpus? But more in general
because I think that, with all the linguistic data available out there on
the web (probably orders of magnitude more data than the whole LDC and
ELDA catalogues put together), it is a good idea to develop/gather/share
tools and procedures to get them in "corpus format"...

Which of course does not mean that prefab corpora do not have their
function, as well.

Regards,

Marco

Next message: Marco Baroni: "[Corpora-List] web-corpora, big and small"
Previous message: Alexander Schutz: "Re: [Corpora-List] Query on the use of Google for corpus research"
In reply to: Alexander Schutz: "Re: [Corpora-List] Query on the use of Google for corpus research"
Next in thread: Tom Emerson: "Re: [Corpora-List] Query on the use of Google for corpus research"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Jun 01 2005 - 01:01:18 MET DST