Re: [Corpora-List] Auto-generation and how to spot it

From: Ramesh Krishnamurthy (r.krishnamurthy@aston.ac.uk)
Date: Mon Nov 13 2006 - 15:57:46 MET

  • Next message: Kristina Hmeljak: "[Corpora-List] "errors and the art of correcting""

    I dont know if it helps, but via Google I discovered that:

    >My eyes tell me that there are fabulous talents in every decade,
    >including this one

    is from
    http://www.hoopshype.com/columns/caste_hans.htm

    >You have to remember where these young guys were picked
    no hits
    >You know things are different when there's a press seat assigned to
    >someone representing lebronjames
    no hits
    >Like many sports, you are going to have writers who are too
    >close to the teams they cover and writers who aren't
    no hits

    Best
    Ramesh
    At 12:06 13/11/2006, you wrote:
    >"My eyes tell me that there are fabulous talents in every decade,
    >including this one. You have to remember where these young guys were
    >picked. You know things are different when there's a press seat
    >assigned to someone representing lebronjames. Like many sports, you
    >are going to have writers who are too close to the teams they cover
    >and writers who aren't."
    >
    >
    >This is the start of a spam which I (and presumably several thousand
    >other people) just received. My suspicion is that the text has been
    >automatically generated from a reasonably large corpus of authentic
    >email material (in this case, presumably, from some collection of
    >sports writing). The interesting question for this list is: how do I
    >know it's artificially generated? I'm guessing that the lack of
    >coherence has something to do with it, but what are the factors
    >which indicate that? And how much text would you need to scan before
    >determining that there was no natural coherence amongst its components?
    >
    >It's a question that several spam filter makers would probably pay
    >good money for an answer to.

    Ramesh Krishnamurthy

    Lecturer in English Studies, School of Languages and Social Sciences,
    Aston University, Birmingham B4 7ET, UK
    [Room NX08, North Wing of Main Building] ; Tel: +44 (0)121-204-3812 ;
    Fax: +44 (0)121-204-3766
    http://www.aston.ac.uk/lss/staff/krishnamurthyr.jsp

    Project Leader, ACORN (Aston Corpus Network): http://corpus.aston.ac.uk/



    This archive was generated by hypermail 2b29 : Mon Nov 13 2006 - 16:19:47 MET