Re: [Corpora-List] license question

From: John F. Sowa (sowa@bestweb.net)
Date: Fri Aug 18 2006 - 20:13:28 MET DST

  • Next message: Steven Bird: "Re: [Corpora-List] license question"

    There is a serious problem with that approach:

    SS> This is why I advocate the procedure of distributing an
    > Internet-derived corpus as a list of URLs.

    Unfortunately, URLs are subject to two limitations:

      1. They become "broken" whenever the web site or the
         directory structure is changed.

      2. Even when the URL is live, the content can be updated
         and changed at any time.

    These two points make a collection of URLs a highly unstable
    way to assemble or distribute a corpus. They make it impossible
    for any analysis performed at one instant of time to be compared
    with any analysis performed at another time.

    John Sowa



    This archive was generated by hypermail 2b29 : Fri Aug 18 2006 - 20:34:17 MET DST