Authorship testing

Ken Litkowski (71520.307@compuserve.com)
04 Feb 96 18:54:57 EST

Techniques of content analysis have been used successfully in some authorship
attribution studies. Don McTavish & Ellen Pirro wrote a paper, "Contextual
Content Analysis", in "Quality & Quantity" (1990) describing a technique
categorizing words into 116 semantic categories across 4 contexts (characterized
as traditional, practical, emotional, and analytic). The technique is known as
Minnesota Contextual Content Analysis (MCCA).

The distribution of categories has been normed using the Brown corpus, so that
texts are then analyzed against this "usual" distribution to determine how far
they differ in emphasis (the categories) and context. Texts are analyzed and
clustered using non-agglomerative techniques, including discriminant analysis.

In a recent paper (Sep. 95), McTavish and I described the use of MCCA in a paper
presented at the Society for Conceptual and Content Analysis by Computer, "A
Computer Content Analysis Approach to Measuring Social Distance in Residential
Organizations for Older People". This study analyzed free text of nursing home
managers, staff, and resident, discriminating among them so that it was possible
to identify not only the organizational role of an unknown text but also the
specific nursing home where the person was located. (Write McTavish at
mctavish@atlas.socsci.umn.edu)

MCCA was applied to careful English translations of the poetry of Pablo Neruda
(for a fest held in the U.S.). Persons then attempted to emulate this poetry.
MCCA not only distinguished these authors but also discriminated "phases" in
Neruda's work.

MCCA is available on mainframe at UMinn and has been implemented in beta-test
version in DIMAP (software for creating and maintaining lexicons for natural
language processing), which will be used for extending the MCCA dictionary. The
technique has strong affinities to "semantic vectors" developed by Liz Liddy at
Syracuse for information retrieval (except that her work uses the LDOCE subject
labels). The semantic categories here are more similar to the "semantic fields"
developed by Eugene Nida in connection with biblical translation studies.

Ken Litkowski TEL.: 301-926-5904
CL Research EMAIL: INTERNET>
20239 Lea Pond Place 71520.307@compuserve.com
Gaithersburg, MD 20879-1270 USA