[Corpora-List] Re: Common connectors

From: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE (jfidel@siu.buap.mx)
Date: Mon Apr 25 2005 - 03:02:54 MET DST

  • Next message: Richard Wicentowski: "[Corpora-List] Ann Arbor: ACL 2005 Newsletter No. 1 and Notice that Registration is Now Open"

    Wallace Chen wrote:

    <<I am currently doing a research on Chinese connectors, which have around
    270 types and broadly include conjunctions and sentence adverbs. These are
    derived from a five-million-word corpus of contemporary Chinese. My question
    is how to determine which ones are "common"? Are there statistical criteria
    (e.g. cut-off point) to determine "common connectors" from such a list?>>

    Xiao, Zhonghua (also known as Richard) wrote back:

    <<I think there is no established statistical norm for what should be
    considered as "common". Maybe we can take account of the two factors
    underlying Mike Scott's idea of "key keyword": frequency and dispersion. If
    an item is frequent and it also occurs in a large number of genres and/or
    texts in your corpus, it can be considered as "common". The cut-off points
    for frequency and coverage, of course, depend upon how many connectors you
    want to include in your study.>>

    Yuanyong Wang <wyy@cse.unsw.EDU.AU> also wrote back:

    <<... I think it depends on what do you mean when refering to "common",
    there there are different sets of common words for different domains. If the
    connectors play some role similar to that of functional words then I suggest
    they are all common(irrespective to domains). Regardless, 270 words
    extracted from a five-million word corpus don't seem to be a very big
    set....>>

    While these comments are well-taken, that's not the whole story, depending
    on what the researcher's interests are. As long ago as 1975 (_Chicago
    linguistic society_), I showed that, in English, in at least some cases,
    what counts as 'common' (I think I used the term 'familiar') depends on the
    phonological structure of the word, as far as vowel reduction is concerned.
    Thus, while frequency is indeed important and even crucial for many things
    in language, other factors may impinge on its effects. In English, for
    example, some quite rare words (eg, 'berserk') act phonologically like
    common words, because of their semantic/phonological saliency (You might
    even want to say outrageousness). In the same sense, it *might* be the case
    (this is just a wild guess) that, for example, two-syllable connectors, or
    especially ones ending in a consonant cluster, say, could act differently
    from one-syllable ones, independently of their frequency. (Of course,
    unless Chinese has by now totally lost its monosyllabic character, this
    hypothetical example would not be valid for Chinese, but some other
    morphophonological characteristic might influence things.)

    Jim

    James L. Fidelholtz
    Posgrado en Ciencias del Lenguaje, ICSyH
    Benemérita Universidad Autónoma de Puebla MÉXICO



    This archive was generated by hypermail 2b29 : Mon Apr 25 2005 - 03:13:21 MET DST