Re: [Corpora-List] Common connectors

From: Yuanyong Wang (wyy@cse.unsw.EDU.AU)
Date: Sun Apr 24 2005 - 11:18:49 MET DST

  • Next message: Joerg Tiedemann: "[Corpora-List] European Constitution in parallel"

        Hi Wallace:

               I'm doing a bit of research in NLP as well, I think it depends
    on what do you mean when refering to "common", there there are different
    sets of common words for different domains. If the connectors play
    some role similar to that of functional words then I suggest they are all
    common(irrespective to domains). Regardless, 270 words extracted
    from a five-million word corpus don't seem to be a very big set. I guess
    you want to make some differentiation within the set itself, then relative
    frequency would be useful for this purpose. I don't know much of your
    research context, I hope this could shed a thread of light on the matter.

         Regards
         Robin.

    On Fri, 22 Apr 2005, Wallace Chen wrote:

    > Dear Corpora colleagues,
    >
    > I am currently doing a research on Chinese connectors, which have around 270 types and broadly include conjunctions and sentence adverbs. These are derived from a five-million-word corpus of contemporary Chinese. My question is how to determine which ones are "common"? Are there statistical criteria (e.g. cut-off point) to determine "common connectors" from such a list? Do I look at their frequencies or rankings? I appreciate anyone who can help me answer the questions or direct me to relevant resources. Thanks in advance for all your help!
    >
    > Wallace Chen



    This archive was generated by hypermail 2b29 : Sun Apr 24 2005 - 11:43:37 MET DST