Re: [Corpora-List] mailing list corpora

From: Niels Ott (niels@drni.de)
Date: Fri Jun 16 2006 - 00:10:45 MET DST

  • Next message: Paula Newman: "RE: [Corpora-List] mailing list corpora"

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Adam ENDRODI wrote:
    > Thoughts, hints? Have you run into similar problems or indeed I am
    > the only one to miss the obvious?

    Here's my idea:

    - - Work on Usenet data.
    - - Do not use archives. If you take postings from a larger
      number of high traffic groups, you should easily get
      your 10.000 postings.
    - - Use Mozilla Thunderbird.
    - - Create a Newsgroup account and subscribe to a number of
      groups.
    - - For each group:
         - Download a lot of headers (you will be asked
           when you click on the group's name for the first
           time).
         - Go to menu "Edit" -> "Newsgroup Properties",
           click on tab "Offline", click button "Download
           now".
         - Wait. (This can take a while...)
    - - Result: In ~/.thunderbird/<someID>/News/<newsaccountname>
      you find an mbox file for each newsgroup

    Best,

       Niels

    (Still CL Student at Tübingen Univ.)

    - --
    Me & Myself: http://www.drni.de/niels/
    "Freedom's just another word for nothing left to lose..." (Janis Joplin)
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.2.2 (GNU/Linux)

    iD8DBQFEkdrlbosnVosUgx0RAvZKAJ9x4EvQNFo+laCSaBklQdVb9M1iLACfSPDT
    ZXfiSYbJQcbyFthQ+AxYAvQ=
    =3cO5
    -----END PGP SIGNATURE-----



    This archive was generated by hypermail 2b29 : Fri Jun 16 2006 - 00:10:35 MET DST