[Corpora-List] Clean Enron Anyone?

From: peetm (peet.morris@comlab.ox.ac.uk)
Date: Fri Mar 18 2005 - 18:14:13 MET

  • Next message: Pascal Soucy: "Re: [Corpora-List] Re: problems with Google"

    Greets!

    I'm wondering whether anyone has a 'cleaned' version of the Enron email
    corpus?

    In its raw state, most of the emails contain routing-headers, footers, and
    disclaimers etc - plus, IMHO, some of the emails are spam.

    If no one has a cleaned up version, I am going to attempt the clean up
    myself - so, if anyone's interested in getting the output of that effort,
    please let me know.

    Have a nice weekend,

    peetm



    This archive was generated by hypermail 2b29 : Fri Mar 18 2005 - 18:09:42 MET