[Corpora-List] Re: Google Books, copyrights, and corpora

From: PÂŽter Halacs (peter@halacsy.com)
Date: Fri Jun 16 2006 - 18:54:28 MET DST

  • Next message: Smith, Nicholas: "RE: [Corpora-List] help with WS concordance search by syntax"

    > What are the implications of this for corpus creation and use?
    > If Google wins, does it mean that we can include *ANY* texts in a corpus,
    > as long as the end user only has access to short KWIC entries
    > (especially if the search interface prevents them from "chaining"
    > these together to re-create larger strings of text)?

    We've created a parallel corpus of English-Hungarian bitexts and
    published on the web after shuffling the texts:

    "Some raw materials used for the Hunglish corpus are under copyright
    (literature, film subtitles, magazines). We prevented the illegal use of
    copyrighted material by shuffling the texts at sentence level. This form
    is still useful for research purposes, while it does not infringe upon
    the rightholders' interests. If you are a copyright holder, and you
    consider the shuffled files infringing, please send email and we will
    remove the material in question from the corpus.

    The Hunglish corpus is open for use (with the above restrictions) under
    a creative commons attributions licence."

    peter



    This archive was generated by hypermail 2b29 : Fri Jun 16 2006 - 21:44:42 MET DST