Re: [Corpora-List] Phishing email corpus

From: radev@umich.edu
Date: Sun Apr 30 2006 - 06:41:58 MET DST

  • Next message: Mustafa Abusalah: "[Corpora-List] Arabic Word Sense Disabmiguation Tools"

    I don't know him. I have been collecting these myself after I read the
    paper in Scientific American:

    @article{bennett&al.03,
      author = {Bennett, Charles H. and Li, Ming and Ma, Bin},
      title = {{Chain Letters and Evolutionary Histories}},
      journal = {{Scientific American}},
      month = {June},
      year = {2003},
      pages = {76--81},
      url = {http://www.sciam.com/article.cfm?colID=1&articleID=0003D476-1852-1EB7-BDC0809EC588EEDF},
    }

    j_kurjian@hotmail.com wrote:
    >
    > Well, Lawrence Kestenbaum is in Michigan somewhere so you might have more
    > luck than I did. He has a quite a few on his site, but he claims to have 15
    > or 20 thousand on his hard drive!
    > J
    >
    >
    >
    > >
    > >I have a larger collection of "Nigerian" Letters, more than 2,500 of
    > >them, collected since 1998. If anyone is interested, drop me a note.
    > >
    > >D.
    > >
    > >j_kurjian@hotmail.com wrote:
    > > >
    > > > Nicklas -
    > > > I don't know if this is what you're looking for but I have a collection
    > >o=
    > > > f=20
    > > > "Nigerian Letters," about 100 of them. They are not tagged. If you
    > >are=20
    > > > handy with a spider or offline browser, you might be able to get some
    > >fro=
    > > > m:
    > > > http://potifos.com/fraud/
    > > > Last year I contacted it's owner, Lawrence Kestenbaum, and he almost
    > >agre=
    > > > ed=20
    > > > to make his massive collection available to me for a corpus. Then it
    > >fel=
    > > > l=20
    > > > through. His site has lotto letters and probably other types too.
    > >Hope=20
    > > > that helps. Let me know if you want what I have.
    > > > Jerry Kurjian
    > > >
    > > >
    > > > >
    > > > >Hi
    > > > >
    > > > >My name is Nicklas Karlsson, and I'm a student at V=E4xj=F6 University
    > >i=
    > > > n=20
    > > > >Sweden.
    > > > >I'm working on my bachelor's degree with a phishing detection and
    > >warnin=
    > > > g=20
    > > > >project.
    > > > >The part I'm working on is a classification module, which will
    > >classify=20
    > > > >emails to find the phishing emails and mark them for further
    > >investigati=
    > > > on.
    > > > >To develop this and test it I need a collection of phishing emails.
    > >I've=
    > > > =20
    > > > >collected a few but I still need more.
    > > > >
    > > > >So I wonder if anyone has or knows where I can find a corpus with
    > >phishi=
    > > > ng=20
    > > > >emails?
    > > > >
    > > > >I'll post a list of all replys sent directly to me.
    > > > >
    > > > >Thanks
    > > > >
    > > > >Nicklas Karlsson
    > > > >
    > > > >
    > > >
    > > >
    > > >
    > > >
    > > >
    > >
    > >
    > >--
    > >Dragomir R. Radev radev@umich.edu
    > >Associate Professor of Information, Electrical Engineering and
    > >Computer Science, and Linguistics, the University of Michigan, Ann Arbor
    > >Phone: 734-615-5225 Fax: 734-764-2475 http://www.si.umich.edu/~radev
    >
    >
    >
    >
    >

    -- 
    Dragomir R. Radev                                         radev@umich.edu
    Associate Professor of Information, Electrical Engineering and
    Computer Science, and Linguistics, the University of Michigan, Ann Arbor
    Phone: 734-615-5225   Fax: 734-764-2475    http://www.si.umich.edu/~radev
    



    This archive was generated by hypermail 2b29 : Sun Apr 30 2006 - 06:40:34 MET DST