------=_NextPart_000_001A_01BE74A6.35AD5E20
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Actually, what I meant was not exactly the same word, but where a category
of words, such as hedges, are 2 or 5 per 1000. For example, you could have
the words nearly, seems, approximately, as hedges, and I'd like to see
whether other such categories would be noticed.
I've tried Ken Litkowski's MCAA analysis, at his suggestion, on the texts,
but am finding that features like hedges, intensifiers, totality markers are
not marked as "content", it seems, and tend to end up in the uncategorized
section. This is a fascinating methodology, though, and it may well be the
method to pursue in getting at some differences. Has there been other work
on such interpretive markings in content analysis? I'm afraid my background
is more in discourse analysis than NLP.
Tony's suggestions are great: to look at the differences between the units
readers focus more closely on.
Thanks so much for all suggestions for further investigation.
Kristen Precht
Northern Arizona University
-----Original Message-----
From: James L. Fidelholtz [mailto:jfidel@siu.buap.mx]
Sent: Monday, March 22, 1999 9:49 AM
To: kprecht@iupui.edu
Cc: CORPORA@uib.no
Subject: Re: Corpora: Statistics in genre differences
On Fri, 19 Mar 1999, Kristen Precht wrote:
[snip]
>..., it's hard to assume that the reader would notice the difference
>between 2 per thousand words and 5 per thousand words.
Not only is it not hard, it is impossible to assume that they WOULDN'T
notice such a gross difference
------=_NextPart_000_001A_01BE74A6.35AD5E20
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD W3 HTML//EN">
Actually, what I meant was not exactly the same word, = but where=20 a category of words, such as hedges, are 2 or 5 per 1000. For example, = you could=20 have the words nearly, seems, approximately, as hedges, and I'd = like to=20 see whether other such categories would be noticed.
I've tried Ken Litkowski's MCAA analysis, at his = suggestion, on=20 the texts, but am finding that features like hedges, intensifiers, = totality=20 markers are not marked as "content", it seems, and tend to end = up in=20 the uncategorized section. This is a fascinating methodology, = though, and=20 it may well be the method to pursue in getting at some = differences. Has=20 there been other work on such interpretive markings in content analysis? = I'm=20 afraid my background is more in discourse analysis than NLP. =
Tony's suggestions are great: to = look at the=20 differences between the units readers focus more closely on. =
Thanks so much for all suggestions for further = investigation.=20
Kristen Precht
Northern Arizona University
-----Original=20
Message-----
From: James L. Fidelholtz [mailto:jfidel@siu.buap.mx]
Sent: Monday, March =
22, 1999=20
9:49 AM
To: kprecht@iupui.edu
Cc:=20
CORPORA@uib.no
Subject: Re: =
Corpora:=20
Statistics in genre differences
On Fri, 19 Mar 1999, Kristen =
Precht=20
wrote:
[snip]
>..., it's hard to assume that the reader would =
notice=20
the difference
>between 2 per thousand words and 5 per thousand=20
words.
Not only is it not hard, it is impossible to assume that =
they=20
WOULDN'T
notice such a gross difference