Re: [Corpora-List] Google Books, copyrights, and corpora

From: Chris Brew (cbrew@acm.org)
Date: Wed Jun 14 2006 - 20:17:03 MET DST

  • Next message: Mark P. Line: "Re: [Corpora-List] Google Books, copyrights, and corpora"

    The technical question is "Is it possible to reconstruct the full text
    from snippets of concordance?". The answer to this depends on how
    snippets are selected. The answer will be "yes" if, for every token in
    the full text, there is some query that would return that token, along
    with enough context to allow the snippets to be sewn back
    together. You would be about as certain that the text was right as you
    are when you solve a cryptogram. While this is less than complete
    mathematical certainty, it would probably convince a judge. The answer
    might be "no" if there are enough tokens that Google can guarantee
    will never appear in a snippet.

    As for the legal question, even a decision in Google's favor might be
    narrowly drawn, in which case we would be on legally dangerous ground
    were we to assume that we can do something just because it seems to us
    similar enough to what Google would (hypothetically) be allowed to
    do. Lawyers have training which allows them to make intelligent
    guesses about things like this, but even they have rather few firm
    precedents to go on. My guess is that a lawyer would advise caution,
    at least for now, simply because it is unclear what judges will
    eventually decide, if and when such a case comes to court. That, however,
    is just a guess.

    -- 
    



    This archive was generated by hypermail 2b29 : Wed Jun 14 2006 - 20:54:46 MET DST