RE: [Corpora-List] Spanish reference corpus

From: Adam Kilgarriff (adam@lexmasterclass.com)
Date: Fri Feb 02 2007 - 08:54:00 MET

  • Next message: Serge Sharoff: "Re: [Corpora-List] Spanish reference corpus"

    Mario,

    Yes, the frequencies etc are available for this corpus via the Sketch
    Engine, a corpus query tool which allows the user to specify and collect
    frequency lists to a wide range of specifications (as well as offering a
    range of other functions including concordancing, 'word sketches' and a
    distributional thesaurus).

    We have taken the URL list as supplied by Serge Sharoff, re-collected the
    corpus (or, at least, a 95% similar corpus) and installed it into the Sketch
    Engine. Self-registration for trial account at
    http://www.sketchengine.co.uk

    Enjoy!

    Adam

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Mario Crespo Miguel
    Sent: 01 February 2007 13:17
    To: s.sharoff@leeds.ac.uk
    Cc: corpora@lists.uib.no
    Subject: Re: [Corpora-List] Spanish reference corpus

    Thank you very much for helping me, but I think it is more
    convenient for me if the frequencies of the words of this open
    domain / general corpus could be obtained. Does anybody know if
    such an information is available some way? Best,

    Mario

    El dia 30 ene 2007 16:10, Serge Sharoff <s.sharoff@leeds.ac.uk>
    escribió:

    > one answer is the Spanish Internet corpus with the interface from
    > http://corpus.leeds.ac.uk/internet.html
    > and the URL list
    > http://corpus.leeds.ac.uk/internet/final-url-es.gz
    >
    > This is a random snapshot of the Spanish Internet of about 120
    > million
    > words, see
    > Sharoff, S (2006) Creating general-purpose corpora using
    > automated
    > search engine queries. In Marco Baroni and Silvia Bernardini,
    > editors,
    > WaCky! Working papers on the Web as Corpus. Gedit, Bologna.
    > http://wackybook.sslmit.unibo.it/
    >
    > S
    >
    > On Tue, 2007-01-30 at 15:54 +0100, Mario Crespo Miguel wrote:
    >> Dear everybody,
    >>
    >> Thank you again for all the help that I always get with this
    >> mailing list, and this time I would like to ask if there is
    >> some reference / open-domain corpus for Spanish which is freely
    >> available and could be downloaded. Thank you in advance. Best
    >> wishes,
    >>
    >> Mario Crespo Miguel
    >>
    >>
    >
    >



    This archive was generated by hypermail 2b29 : Fri Feb 02 2007 - 08:51:53 MET