Re: [Corpora-List] Legal aspects of compiling corpora

From: Mark Sanderson (
Date: Fri Jun 13 2003 - 15:40:29 MET DST

  • Next message: Lam Yuen Wing, Peter: "[Corpora-List] Re: size of reference corpus"

    I think the honest answer is that it is a question with no clear answer.

    I know that legal concerns have prevented US government funded projects
    such as TREC ( from building Web collections and they
    have got other organisations to build and distribute such collections. I
    also know that Web search engines have been ordered to take off image and
    sound collections from their Web sites, but I don't think this has happened
    with HTML. Maybe text is viewed as being generally less valuable than other
    media types.

    At 09:49 13/06/2003 -0300, wrote:

    >Dear Linguists and Lawyers,
    >I am troubled with Legal aspects of corpora compiling. I am in
    >doubt if is an illegal procedure storage webpages (or part of them)
    >in a database (see at,
    >not available to public, and display its contents as short collocations
    >less than 100 characters by time by search method.
    >On the other hand, the Internet search engines uses cached (temporary ?)
    >copies of the sites and display a short of the web pages.
    >My procedure is wrong? Which the Legal difference? I need ask permission
    >for each website to storage its pages? If I mention the source and the author
    >I will be protecting the copyrights?
    >I look forward to hearing from you.
    >Yours Sincerely,
    >J. L. De Lucca
    >This mail sent through IMP:

    Mark Sanderson, Room 303 Tel: +44 (0) 114 22 22648
    Department of Information Studies Fax: +44 (0) 114 27 80300
    University of Sheffield, Regent Court,
    211 Portobello St., Sheffield, S1 4DP, UK
    Good judgement comes from experience, experience comes from bad judgement

    This archive was generated by hypermail 2b29 : Fri Jun 13 2003 - 15:39:10 MET DST