Re: [Corpora-List] free tagged corpus

From: Martin Wynne (martin.wynne@oucs.ox.ac.uk)
Date: Thu Nov 17 2005 - 12:13:28 MET

  • Next message: Tony Berber Sardinha: "Re: [Corpora-List] computing semantic word similarity"

    Dear Delip,

    It depends on what you mean by 'freely available'. This has (at least)
    two meanings in this context. It can mean free of cost, or it can mean
    free of legal or ethical restrictions on its use.

    Many corpora are do not cost money to use, although the ones mentioned
    so far in this thread, such as the BNC and resources from the LDC, do
    cost money.

    As for legal and ethical restrictions, it may be useful to look at what
    they say in the world of software, where several levels of freedom can
    be differentiated:

         * The freedom to run the program, for any purpose (freedom 0).
         * The freedom to study how the program works, and adapt it to your
    needs (freedom 1). Access to the source code is a precondition for this.
         * The freedom to redistribute copies so you can help your neighbor
    (freedom 2).
         * The freedom to improve the program, and release your improvements
    to the public, so that the whole community benefits (freedom 3). Access
    to the source code is a precondition for this.

    (from http://www.gnu.org/philosophy/free-sw.html)

    With corpora, a parallel classification may be possible:

         * The freedom to access and analyse the corpus (freedom 0).
         * The freedom to run your own tools on the corpus, and adapt it to
    your needs (freedom 1). Access to the full text of the corpus is a
    precondition for this.
         * The freedom to redistribute copies so you can help your neighbor
    (freedom 2).
         * The freedom to add texts or metadata or annotations, and release
    your improvements to the public, so that the whole community benefits
    (freedom 3).

    In most cases, any of the above freedoms may be restricted by only
    allowing the relevant freedoms in the context of academic or
    non-commercial research, though the precise terms of these restrictions
    may vary, and the boundaries of non-commercial may not be easy to draw.

    Usually a corpus creator cannot simply release a corpus under terms of
    their choosing, allowing whichever of the above freedoms they want to,
    because they don't own the rights over all of the texts contained in the
    corpus. A corpus usually contains texts written or spoken by various
    people, and these people, or publishers, or employers, or others, are
    likely to have intellectual property rights over these texts.
    (Furthermore, the corpus builders are acquire rights over the
    collection, but these may reside not in the individuals but in their
    institution or funders). To complicate things further, the relevant laws
    relating to these rights vary in different countries, and have varied
    over time.

    My colleague Lou Burnard asked a similar question on this list in
    January this year. You can see the start of the thread in the archive at
    http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0501&L=CORPORA&D=0&I=-3&P=13048
    He was surprised to find virtually nothing which could be distributed
    under something like an open source software licence.

    The simplest answer to this is that you have to say a bit more precisely
    what it is you want to be free to do with the corpus, and then maybe
    you'll get some more answers.

    Best wishes,
    Martin

    Delip Rao wrote:
    > Hello All,
    >
    > Is there any freely available part-of-speech tagged
    > corpus for research/non-commercial use?
    >
    > Thanks,
    > Delip Rao
    > -----------
    > AIDB LAB,
    > IIT MADRAS
    >
    >
    >
    >
    >
    > __________________________________
    > Do you Yahoo!?
    > New and Improved Yahoo! Mail - 1GB free storage!
    > http://sg.whatsnew.mail.yahoo.com
    >
    >

    -- 
    Martin Wynne
    Head of the Oxford Text Archive and
    AHDS Literature, Languages and Linguistics
    

    Oxford University Computing Services 13 Banbury Road Oxford UK - OX2 6NN Tel: +44 1865 283299 Fax: +44 1865 273275 martin.wynne@oucs.ox.ac.uk



    This archive was generated by hypermail 2b29 : Thu Nov 17 2005 - 12:29:42 MET