RE: [Corpora-List] free tagged corpus

From: Adam Kilgarriff (adam@lexmasterclass.com)
Date: Thu Nov 17 2005 - 17:07:56 MET

  • Next message: Delip Rao: "Re: [Corpora-List] free tagged corpus"

    At risk of adding more complexity than anyone wants, here is another option:
    Freedom to provide a web interface to a corpus.

    If I provide a web interface to a corpus, I am doing something rather less
    than redistributing the corpus. I am giving my users another flavour of
    "freedom 0", rather than "freedom 1".

    I am also doing what Google and Yahoo do, in relation to the corpus that is
    the web. (They neither pay anything to data owners, nor even ask
    permission)

    Adam

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of David Graff
    Sent: 17 November 2005 15:03
    To: CORPORA@UIB.NO
    Subject: Re: [Corpora-List] free tagged corpus

    martin.wynne@oucs.ox.ac.uk said:
    > With corpora, a parallel classification may be possible:
    >
    > * The freedom to access and analyse the corpus (freedom 0).
    > * The freedom to run your own tools on the corpus, and adapt it to
    > your needs (freedom 1). Access to the full text of the corpus is a
    > precondition for this.
    > * The freedom to redistribute copies so you can help your neighbor
    > (freedom 2).
    > * The freedom to add texts or metadata or annotations, and release
    > your improvements to the public, so that the whole community benefits
    > (freedom 3).

    Regarding "freedom 3" (the last point), there can be one important
    difference between corpora and software. For many kinds of corpus
    research, it's possible to circulate metadata and annotations in
    "stand-off" form: instead of including the corpus data with the
    annotations, you include indexing information (file name, document ID,
    byte offset, etc) that cites a reference release of the corpus data.

    Obviously, the only people who can make use of stand-off annotations are
    those who already have or can get "freedom 1" (access to full text) for the
    given corpus. (Or maybe there are ways to make these annotations work for
    people who only have "freedom 0"?)

    In any case, many researchers can contribute to the community in this way,
    and many others can benefit, without risking property-rights infringements:
    given that the annotations do not contain a replication of the corpus,
    ownership of the annotations (and the choice of whether/how to distribute
    them) resides with the annotation creator, and is not limited in any direct
    way by the distribution constraints of the corpus.

            David Graff



    This archive was generated by hypermail 2b29 : Thu Nov 17 2005 - 17:13:13 MET