Re: [Corpora-List] Google Books, copyrights, and corpora

From: Nathan Bauman (n.bauman@utoronto.ca)
Date: Wed Jun 14 2006 - 17:54:07 MET DST

  • Next message: Eric Atwell: "Re: [Corpora-List] Google Books, copyrights, and corpora"

    I'd be interested in hearing how Google is going to stop people from
    recreating texts. My gut feeling is that Google is in the wrong on this
    one.

    An anecdote: My old professor of Religious Studies, Martin Abegg, used
    precisely such a concordance to piece together the corpus of Dead Sea
    Scrolls for his Ph.D dissertation. A private paper concordance had been
    produced by the team in charge of publishing the scrolls; a few copies of
    that concordance were lent to various institutions. The one that he used
    was freely available on the stacks of the library at Hebrew Union College.
    I remember how he told us that the reason he used the concordance to piece
    together the texts was because he needed just one text, an unpublished one,
    for his dissertation. After he had assembled the entire corpus of texts
    known at that time, he was strongly encouraged by various people to publish
    all of them, which he eventually did. He was sued, if memory serves
    correctly, in both an Israeli court and an American one, but I cannot recall
    the outcome of either case. (Eventually, things worked out for him, as he
    ended up compiling the index volume to the official publication series some
    years later. A young undergrad, I was paid to check the English
    transliteration of names for the volume.) Anyway, good luck--and be
    careful.

    Nathan Bauman
    General English Program,
    Sookmyung Women's University
    Seoul, South Korea

    ----- Original Message -----
    From: "Mark Davies" <Mark_Davies@byu.edu>
    To: <corpora@hd.uib.no>
    Sent: Thursday, June 15, 2006 12:18 AM
    Subject: [Corpora-List] Google Books, copyrights, and corpora

    > Most of us are familiar with the Google Books initiative -- the project
    > that will digitize tens of millions of books from several leading
    > libraries (http://books.google.com/intl/en/googlebooks/about.html). Google
    > scans these books and then makes them searchable for end users via the
    > Web.
    >
    > For copyrighted works, the end users see only a "snippet" view -- similar
    > to what we linguists would call an entry in a KWIC display. This is the
    > line of text containing the word or phrase searched for, and maybe one
    > line of text before and one after.
    >
    > Google claims that although the entire text is (indexed) on the server,
    > the end user sees only very limited context, and there is therefore no
    > violation of US Fair Use Law. See
    > http://books.google.com/googlebooks/newsviews/legal.html for their legal
    > claims and http://fairuse.stanford.edu/ for US Fair Use Law.
    >
    > In 2005 Google was sued by the American Association of Publishers, which
    > claimed that the "snippet defense" is not adequate in this case (see
    > http://publishers.org/press/releases.cfm?PressReleaseArticleID=292). The
    > case is still in litigation.
    >
    > ---
    >
    > What are the implications of this for corpus creation and use? If Google
    > wins, does it mean that we can include *ANY* texts in a corpus, as long as
    > the end user only has access to short KWIC entries (especially if the
    > search interface prevents them from "chaining" these together to re-create
    > larger strings of text)? I guess I'm interested in this question right
    > now, as I'm considering the legal implications of using a particular text
    > collection (300+ million words) as part of a historical corpus of English.
    >
    > In the past, we've discussed copyright and we've discussed Google and
    > we've discussed Google copyright issues (see several CORPORA posts in June
    > 2003 relating to cached web pages). But this discussion was before Google
    > announced the Google Books initiative, and before they announced the
    > "snippet defense", which seems to have clear application to what we're
    > doing (or could do) with corpora.
    >
    > Any comments?
    >
    > =================================================
    > Mark Davies
    > Assoc. Prof., Linguistics
    > Brigham Young University
    > (phone) 801-422-9168 / (fax) 801-422-0906
    > http://davies-linguistics.byu.edu
    >
    > ** Corpus design and use // Linguistic databases **
    > ** Historical linguistics // Language variation **
    > ** English, Spanish, and Portuguese **
    > =================================================
    >
    >



    This archive was generated by hypermail 2b29 : Wed Jun 14 2006 - 17:54:38 MET DST