Re: [Corpora-List] Legal aspects of compiling corpora

From: Susana Sotillo (sotillos@mail.montclair.edu)
Date: Tue Jun 17 2003 - 17:14:22 MET DST

  • Next message: Brett Reynolds: "[Corpora-List] Identifying words in Japanese"

    Torzec Nicolas ATER LSI wrote:

    > Dear Linguists and Lawyers,
    > I have got the same "problem" with a large (tagged) monitor corpus of
    > texts from french written on-line forums :
    > - these messages are publically available in the sense that everybody
    > can read and reuse them The key term here is "publicly" available.
    > - each newsgroup server stores and uses its own copies of them
    > - search engines use and exploit cached copies of them
    > - ...
    >
    > So,
    > - It is an illegal procedure to store these messages - in an anonymous
    > way - in a database ? Why should it be illegal if none of the participants are
    > identified? I have also downloaded and stored hundreds of chat messages from
    > Bulletin Boards and "notified" the owners of the bulletin boards. Fortunately,
    > one had deleted all its messages when it changed its format. I do not delete
    > politicians' names. In the US, you can write and say things about people in
    > public office and they cannot sue you unless you deliberately accuse someone of
    > stealing or doing something improper without any proof. If you defame them
    > knowing that what you are saying is false, they can certainly sue you for slander
    > and libel.
    > - It is an illegal procedure to exploit this corpus for research
    > purposes ? (i.e. to realise linguistic studies and to develop NLP
    > processing using corpus-based machine learning methods) This is falls under fair
    > use, at least in the US.
    > - It is an illegal procedure to illustrate scientific articles with
    > examples from this corpus ? You need a lawyer to clarify this.
    >
    > Do I need to ask permission for each author to store and use its
    > messages ? What if I mention the source and the author ? What about the
    > copyrights? If you identify the chat list/Bulletin Board and use the
    > participants' real names, you ought to ask permission to do so. Copyrights are
    > usually held by the owners of the chat list or bulletin board.
    >
    > Moreover,
    > - What if I want to make my corpus publically available for researchers
    > ?
    > - What if NLP processing developed from this corpus are to be integrated
    > in commercial products ? This is where things become problematic. I am all in
    > favor of "open architecture" and sharing knowledge, but when people decide to
    > charge for their products, we have all kinds of problems. (The "greed" or profit
    > factor.) I would prefer to create my own "specialized corpus" and share my
    > findings with others. Unfortunately, you cannot "generalize" findings based on
    > specialized corpora.
    >
    > Thank you in advances for your help...
    > References, pointers and suggestions are welcome, especially for the
    > legal aspects for France... Sorry, I know nothing about French copyright laws.
    >
    > Nicholas Torzec
    >
    > --
    > Nicolas Torzec
    > PhD Student in NLP processing
    > --
    >
    > delucca@nilc.icmc.usp.br wrote:
    > >
    > > Dear Linguists and Lawyers,
    > >
    > > I am troubled with Legal aspects of corpora compiling. I am in
    > > doubt if is an illegal procedure storage webpages (or part of them)
    > > in a database (see at http://www.dictionarium.com/project.htm),
    > > not available to public, and display its contents as short collocations
    > > less than 100 characters by time by search method.
    > >
    > > On the other hand, the Internet search engines uses cached (temporary ?)
    > > copies of the sites and display a short of the web pages.
    > >
    > > My procedure is wrong? Which the Legal difference? I need ask permission
    > > for each website to storage its pages? If I mention the source and the author
    > > I will be protecting the copyrights?
    > >
    > >
    > > I look forward to hearing from you.
    > >
    > > Yours Sincerely,
    > >
    > > J. L. De Lucca
    > >
    > > -------------------------------------------------
    > > This mail sent through IMP: http://horde.org/imp/
    >
    > --
    > Nicolas TORZEC
    >
    > ENSSAT / Université de Rennes 1
    > 6, rue de Kerampont
    > 22300 Lannion
    >
    > Mel : nicolas.torzec@enssat.fr
    > Tel : 02.96.46.27.30
    > Fax : 02.96.37.01.99
    > Web : http://www.enssat.fr
    > --



    This archive was generated by hypermail 2b29 : Tue Jun 17 2003 - 17:15:03 MET DST