[Corpora-List] A few questions concerning WordSmith 4.0

From: Georg Marko (georg.marko@uni-graz.at)
Date: Mon Nov 27 2006 - 20:27:13 MET

Next message: Shane Axtell: "[Corpora-List] Dictionaries/Lexical Databases"

Previous message: Anna Feldman: "[Corpora-List] 2nd CFP: HLT/NAACL-07 Workshop on Computational Approaches to Figurative Language [CORRECTED WORKSHOP URL]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dear all,

I have some questions concerning work with Concord in WordSmith 4.0
(excuse my incompetence or in case I overlooked an apparent mistake).

In the 3.0 version I used the Collocation function to look for words
with a particular suffix or ending (in case it covers more or less than
a real morpheme). For this purpose I used a truncated search, e.g.
"*ism" and the set the collocation horizon to 0/0. Strictly speaking,
the programme then did not really calculate collocations as words
appearing to the left or the right of the search string, but just
produced a list of the centre words. As this, however, covered all
-ism-words ("intellectualism", "capitalism", "occultism", etc.), this
was exactly what I wanted.

Now in the 4.0 version, I can no longer choose a zero horizon - neither
to the left nor to the right. This problem can, however, be solved by
clicking on the centre column in the Collocations, which orders the
words according to the frequencies at which they appear as the central
word, which gives me the same results as the procedure just described
for 3.0. The problem that I have is that the Collocation function does
not give me the full version of the central word, but just the first two
letters (e.g. "in", "ca", "oc"). If there are not that many, I may be
able to guess the word, but in other cases this is impossible. As I am
not the most intelligent and sophisticated user of WordSmith, I doubt
that this problem is due to my challenging demands, but rather a result
of me missing some setting options or something similar. But I seem to
be unable to detect what I am missing.

A second, probably similarly simple problem is that I seem to be unable
to exclude words if working with a truncated search word. E.g. if
looking for synthetic comparatives in English, using "*er" as my target,
I would like the programme to ignore obvious high-frequency words such
as "ever", "never", "her", "after" etc. This was possible with WordSmith
3.0, but I cannot find the equivalent function in the 4.0 version.

The third problem concerns the use of search files. Examining corpora
concentrating on particular discourses (e.g. women's magazines, wellness
brochures, popular books promoting lifestyle changes, etc.), I have
started to use files comprising more exhaustive lists of particular
lexical fields, e.g. nutrients, social relations, diseases, etc. This
allows me to compare the extent to which a specific discourse focuses,
for instance, on nutritional aspects of food or takes a rather
pathological view of life (at least on a superficial level). Now I have
put together a heavy list of pathological terminology composed of
internet resources and some initial searches. This covers some 4,000
expressions. I was not really surprised that WordSmith could not finish
checking the occurrence of these expressions in a 600,000 word corpus,
considering that I do not have an ultrafast computer. I was just
wondering whether there is any limit to a search file (say 500 lines or
something like that) with which you can successfully perform such
searches even with a moderately fast computer.

Any help would be highly appreciated :-)

Georg

-- ******************************************************************************* * Mag. Dr. Georg Marko, M.A., Vertragsassistent * Institut fuer Anglistik (Department of English Studies) * Karl-Franzens-Universitaet Graz * Heinrichstrasse 36, A-8010 Graz * tel.: +43/316/380-2474 * e-mail: georg.marko@kfunigraz.ac.at *******************************************************************************

"I drew a treasure map on your hand" Ani diFranco

Next message: Shane Axtell: "[Corpora-List] Dictionaries/Lexical Databases"
Previous message: Anna Feldman: "[Corpora-List] 2nd CFP: HLT/NAACL-07 Workshop on Computational Approaches to Figurative Language [CORRECTED WORKSHOP URL]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Nov 27 2006 - 20:32:00 MET