Re: [Corpora-List] free tagged corpus

From: Delip Rao (deliprao@yahoo.com)
Date: Thu Nov 17 2005 - 20:05:33 MET

  • Next message: Kristofer Franzén: "Re: [Corpora-List] free tagged corpus"

    Dear Martin/All,

    By "free" I meant $0, not "freedom". As a research
    student I would be willing to comply with the
    legal/ethical restrictions etc. Most standard
    literature in good conferences use corpora from
    sources like LDC which are not available free of cost.
    If my organization is not a member of LDC then I would
    not have access to these. Are they any free-of-cost
    PoS tagged corpora for experimentation that is well
    accepted by the research community?

    Thanks,
    Delip

    --- Martin Wynne <martin.wynne@oucs.ox.ac.uk> wrote:

    > Dear Delip,
    >
    > It depends on what you mean by 'freely available'.
    > This has (at least)
    > two meanings in this context. It can mean free of
    > cost, or it can mean
    > free of legal or ethical restrictions on its use.
    >
    > Many corpora are do not cost money to use, although
    > the ones mentioned
    > so far in this thread, such as the BNC and resources
    > from the LDC, do
    > cost money.
    >
    > As for legal and ethical restrictions, it may be
    > useful to look at what
    > they say in the world of software, where several
    > levels of freedom can
    > be differentiated:
    >
    > * The freedom to run the program, for any
    > purpose (freedom 0).
    > * The freedom to study how the program works,
    > and adapt it to your
    > needs (freedom 1). Access to the source code is a
    > precondition for this.
    > * The freedom to redistribute copies so you can
    > help your neighbor
    > (freedom 2).
    > * The freedom to improve the program, and
    > release your improvements
    > to the public, so that the whole community benefits
    > (freedom 3). Access
    > to the source code is a precondition for this.
    >
    > (from http://www.gnu.org/philosophy/free-sw.html)
    >
    > With corpora, a parallel classification may be
    > possible:
    >
    > * The freedom to access and analyse the corpus
    > (freedom 0).
    > * The freedom to run your own tools on the
    > corpus, and adapt it to
    > your needs (freedom 1). Access to the full text of
    > the corpus is a
    > precondition for this.
    > * The freedom to redistribute copies so you can
    > help your neighbor
    > (freedom 2).
    > * The freedom to add texts or metadata or
    > annotations, and release
    > your improvements to the public, so that the whole
    > community benefits
    > (freedom 3).
    >
    > In most cases, any of the above freedoms may be
    > restricted by only
    > allowing the relevant freedoms in the context of
    > academic or
    > non-commercial research, though the precise terms of
    > these restrictions
    > may vary, and the boundaries of non-commercial may
    > not be easy to draw.
    >
    > Usually a corpus creator cannot simply release a
    > corpus under terms of
    > their choosing, allowing whichever of the above
    > freedoms they want to,
    > because they don't own the rights over all of the
    > texts contained in the
    > corpus. A corpus usually contains texts written or
    > spoken by various
    > people, and these people, or publishers, or
    > employers, or others, are
    > likely to have intellectual property rights over
    > these texts.
    > (Furthermore, the corpus builders are acquire rights
    > over the
    > collection, but these may reside not in the
    > individuals but in their
    > institution or funders). To complicate things
    > further, the relevant laws
    > relating to these rights vary in different
    > countries, and have varied
    > over time.
    >
    > My colleague Lou Burnard asked a similar question on
    > this list in
    > January this year. You can see the start of the
    > thread in the archive at
    >
    http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0501&L=CORPORA&D=0&I=-3&P=13048
    > He was surprised to find virtually nothing which
    > could be distributed
    > under something like an open source software
    > licence.
    >
    > The simplest answer to this is that you have to say
    > a bit more precisely
    > what it is you want to be free to do with the
    > corpus, and then maybe
    > you'll get some more answers.
    >
    > Best wishes,
    > Martin
    >
    >
    > Delip Rao wrote:
    > > Hello All,
    > >
    > > Is there any freely available part-of-speech
    > tagged
    > > corpus for research/non-commercial use?
    > >
    > > Thanks,
    > > Delip Rao
    > > -----------
    > > AIDB LAB,
    > > IIT MADRAS
    > >
    > >
    > >
    > >
    > >
    > > __________________________________
    > > Do you Yahoo!?
    > > New and Improved Yahoo! Mail - 1GB free storage!
    > > http://sg.whatsnew.mail.yahoo.com
    > >
    > >
    >
    >
    > --
    > Martin Wynne
    > Head of the Oxford Text Archive and
    > AHDS Literature, Languages and Linguistics
    >
    > Oxford University Computing Services
    > 13 Banbury Road
    > Oxford
    > UK - OX2 6NN
    > Tel: +44 1865 283299
    > Fax: +44 1865 273275
    > martin.wynne@oucs.ox.ac.uk
    >

            
            
                    
    __________________________________
    Do you Yahoo!?
    New and Improved Yahoo! Mail - 1GB free storage!
    http://sg.whatsnew.mail.yahoo.com



    This archive was generated by hypermail 2b29 : Thu Nov 17 2005 - 20:27:22 MET