Re: [Corpora-List] Query on the use of Google for corpus research

From: Marco Baroni (baroni@sslmit.unibo.it)
Date: Wed Jun 01 2005 - 00:56:13 MET DST

  • Next message: Marco Baroni: "[Corpora-List] web-corpora, big and small"

    > But then again, why not go simply to UPenn and purchase some
    > license for English Gigaword plus some additional tens of millions
    > words corpora from LDC?

    For example because I'm also interested in 1 billion words of Italian,
    German and Japanese? Or because I think that the web can give us a more
    varied picture of a language than a newswire corpus? But more in general
    because I think that, with all the linguistic data available out there on
    the web (probably orders of magnitude more data than the whole LDC and
    ELDA catalogues put together), it is a good idea to develop/gather/share
    tools and procedures to get them in "corpus format"...

    Which of course does not mean that prefab corpora do not have their
    function, as well.

    Regards,

    Marco



    This archive was generated by hypermail 2b29 : Wed Jun 01 2005 - 01:01:18 MET DST