[Corpora-List] Auto-generation and how to spot it

From: Lou Burnard (lou.burnard@computing-services.oxford.ac.uk)
Date: Mon Nov 13 2006 - 13:06:52 MET

Next message: Yorick Wilks: "Re: [Corpora-List] Auto-generation and how to spot it"

Previous message: Mark Stevenson: "[Corpora-List] Research Associate in Natural Language Processing, University of Sheffield"
Next in thread: Yorick Wilks: "Re: [Corpora-List] Auto-generation and how to spot it"
Reply: Yorick Wilks: "Re: [Corpora-List] Auto-generation and how to spot it"
Reply: Diana Maynard: "Re: [Corpora-List] Auto-generation and how to spot it"
Reply: Ramesh Krishnamurthy: "Re: [Corpora-List] Auto-generation and how to spot it"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"My eyes tell me that there are fabulous talents in every decade,
including this one. You have to remember where these young guys were
picked. You know things are different when there's a press seat
assigned to someone representing lebronjames. Like many sports, you are
going to have writers who are too close to the teams they cover and
writers who aren't."

This is the start of a spam which I (and presumably several thousand
other people) just received. My suspicion is that the text has been
automatically generated from a reasonably large corpus of authentic
email material (in this case, presumably, from some collection of sports
writing). The interesting question for this list is: how do I know it's
artificially generated? I'm guessing that the lack of coherence has
something to do with it, but what are the factors which indicate that?
And how much text would you need to scan before determining that there
was no natural coherence amongst its components?

It's a question that several spam filter makers would probably pay good
money for an answer to.

Next message: Yorick Wilks: "Re: [Corpora-List] Auto-generation and how to spot it"
Previous message: Mark Stevenson: "[Corpora-List] Research Associate in Natural Language Processing, University of Sheffield"
Next in thread: Yorick Wilks: "Re: [Corpora-List] Auto-generation and how to spot it"
Reply: Yorick Wilks: "Re: [Corpora-List] Auto-generation and how to spot it"
Reply: Diana Maynard: "Re: [Corpora-List] Auto-generation and how to spot it"
Reply: Ramesh Krishnamurthy: "Re: [Corpora-List] Auto-generation and how to spot it"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Nov 13 2006 - 12:56:14 MET