[Corpora-List] E-mail corpora?

From: Dean Jones (dean.m.jones@gmail.com)
Date: Wed Sep 20 2006 - 21:22:32 MET DST

  • Next message: Normand Peladeau: "[Corpora-List] Corpora of categorized answers to open-ended questions"

    Hello all,

    I'm looking for collections of e-mails which would be suitable for
    training some NLP tools, and wondered if anyone on this list could
    point me in the right direction. We're mainly interested in training
    categorisation tools, but are also interested in performing other
    kinds of analysis (e.g. POS tagging, named-entity extraction) to
    compare the performance of our tools on e-mails and other kinds of
    documents .

    I know about the Enron corpus and a couple of spam corpora (Spam
    Assassin, TREC SPAM track) - is there anything I'm missing out on? As
    this is for a commercial project, I'm interested in hearing about both
    free and commercial corpora. Our immediate interest is in
    English-language documents, but other languages would also be of
    longer-term interest.

    Many thanks for any pointers,

    Dean.



    This archive was generated by hypermail 2b29 : Wed Sep 20 2006 - 21:20:25 MET DST