[Corpora-List] Re: Google Books, copyrights, and corpora

From: PÂŽter Halacs (peter@halacsy.com)
Date: Fri Jun 16 2006 - 18:54:28 MET DST

Next message: Smith, Nicholas: "RE: [Corpora-List] help with WS concordance search by syntax"

Previous message: Mark P. Line: "Re: [Corpora-List] Google Books, copyrights, and corpora"
In reply to: Mark Davies: "[Corpora-List] Google Books, copyrights, and corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> What are the implications of this for corpus creation and use?
> If Google wins, does it mean that we can include *ANY* texts in a corpus,
> as long as the end user only has access to short KWIC entries
> (especially if the search interface prevents them from "chaining"
> these together to re-create larger strings of text)?

We've created a parallel corpus of English-Hungarian bitexts and
published on the web after shuffling the texts:

"Some raw materials used for the Hunglish corpus are under copyright
(literature, film subtitles, magazines). We prevented the illegal use of
copyrighted material by shuffling the texts at sentence level. This form
is still useful for research purposes, while it does not infringe upon
the rightholders' interests. If you are a copyright holder, and you
consider the shuffled files infringing, please send email and we will
remove the material in question from the corpus.

The Hunglish corpus is open for use (with the above restrictions) under
a creative commons attributions licence."

peter

Next message: Smith, Nicholas: "RE: [Corpora-List] help with WS concordance search by syntax"
Previous message: Mark P. Line: "Re: [Corpora-List] Google Books, copyrights, and corpora"
In reply to: Mark Davies: "[Corpora-List] Google Books, copyrights, and corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Jun 16 2006 - 21:44:42 MET DST