[Corpora-List] A few questions concerning WordSmith 4.0

From: Georg Marko (georg.marko@uni-graz.at)
Date: Mon Nov 27 2006 - 20:27:13 MET

  • Next message: Shane Axtell: "[Corpora-List] Dictionaries/Lexical Databases"

    Dear all,

    I have some questions concerning work with Concord in WordSmith 4.0
    (excuse my incompetence or in case I overlooked an apparent mistake).

    In the 3.0 version I used the Collocation function to look for words
    with a particular suffix or ending (in case it covers more or less than
    a real morpheme). For this purpose I used a truncated search, e.g.
    "*ism" and the set the collocation horizon to 0/0. Strictly speaking,
    the programme then did not really calculate collocations as words
    appearing to the left or the right of the search string, but just
    produced a list of the centre words. As this, however, covered all
    -ism-words ("intellectualism", "capitalism", "occultism", etc.), this
    was exactly what I wanted.

    Now in the 4.0 version, I can no longer choose a zero horizon - neither
    to the left nor to the right. This problem can, however, be solved by
    clicking on the centre column in the Collocations, which orders the
    words according to the frequencies at which they appear as the central
    word, which gives me the same results as the procedure just described
    for 3.0. The problem that I have is that the Collocation function does
    not give me the full version of the central word, but just the first two
    letters (e.g. "in", "ca", "oc"). If there are not that many, I may be
    able to guess the word, but in other cases this is impossible. As I am
    not the most intelligent and sophisticated user of WordSmith, I doubt
    that this problem is due to my challenging demands, but rather a result
    of me missing some setting options or something similar. But I seem to
    be unable to detect what I am missing.

    A second, probably similarly simple problem is that I seem to be unable
    to exclude words if working with a truncated search word. E.g. if
    looking for synthetic comparatives in English, using "*er" as my target,
    I would like the programme to ignore obvious high-frequency words such
    as "ever", "never", "her", "after" etc. This was possible with WordSmith
    3.0, but I cannot find the equivalent function in the 4.0 version.

    The third problem concerns the use of search files. Examining corpora
    concentrating on particular discourses (e.g. women's magazines, wellness
    brochures, popular books promoting lifestyle changes, etc.), I have
    started to use files comprising more exhaustive lists of particular
    lexical fields, e.g. nutrients, social relations, diseases, etc. This
    allows me to compare the extent to which a specific discourse focuses,
    for instance, on nutritional aspects of food or takes a rather
    pathological view of life (at least on a superficial level). Now I have
    put together a heavy list of pathological terminology composed of
    internet resources and some initial searches. This covers some 4,000
    expressions. I was not really surprised that WordSmith could not finish
    checking the occurrence of these expressions in a 600,000 word corpus,
    considering that I do not have an ultrafast computer. I was just
    wondering whether there is any limit to a search file (say 500 lines or
    something like that) with which you can successfully perform such
    searches even with a moderately fast computer.

    Any help would be highly appreciated :-)

    Georg

    -- 
    *******************************************************************************
    * Mag. Dr. Georg Marko, M.A., Vertragsassistent
    * Institut fuer Anglistik (Department of English Studies)
    * Karl-Franzens-Universitaet Graz
    * Heinrichstrasse 36, A-8010 Graz
    * tel.: +43/316/380-2474
    * e-mail: georg.marko@kfunigraz.ac.at
    *******************************************************************************
    

    "I drew a treasure map on your hand" Ani diFranco



    This archive was generated by hypermail 2b29 : Mon Nov 27 2006 - 20:32:00 MET