Re: [Corpora-List] Irish language corpora

From: Kevin Scannell (kscanne@gmail.com)
Date: Sat Nov 25 2006 - 00:09:52 MET

  • Next message: Adam Kilgarriff: "RE: [Corpora-List] Irish language corpora"

    On 16:31 Fri 24 Nov , Mike Maxwell wrote:
    > fitzgerr@aston.ac.uk wrote:
    > >I am looking for a corpus of Irish language for some research, but all I
    > >seem to be able to find are corpora based on literary texts, predominantly
    > >dated from before the 20th Century. For my research purposes, I need a
    > >corpus that contains terminology that is as contemporary as possible.
    >
    > I presume you've looked at the NCI (Nation Corpus for Ireland), and that
    > it doesn't meet your needs.
    >
    > Have you looked at Keven Scannel's collection
    > (http://borel.slu.edu/crubadan/index.html)? Looks like he has a 25M
    > word corpus of Irish, which I believe he collected entirely off the web.

     Yes, I have large web-crawled corpora from the crubadan project,
    and also from some on-going web crawling in support of my
    search engine www.aimsigh.com (a description of that site in
    English is here: http://www.aimsigh.com/eolas.html - some list
    members might find the ideas behind the site interesting even though
    it only supports Irish at the moment). In all, there are about
    100 million words of Irish on the web that are indexed by the site.

    Ronan, feel free to write me off-list and I can see about putting
    together something suitable for you.

    -Kevin



    This archive was generated by hypermail 2b29 : Sat Nov 25 2006 - 00:07:29 MET