Re: Corpora: What is a corpus

From: Lucian Galescu (galescu@cs.rochester.edu)
Date: Fri Jan 28 2000 - 01:48:41 MET

  • Next message: Lou Burnard: "Re: Corpora: What is a corpus"

    It strikes me as ironic that corpus linguists would want to prescribe
    the usage of the word "corpus". Using Oliver's terminology, I would say
    that all corpora are `filtered'. choosing 13th century texts, or
    Shakespeare's plays, or conversations with a travel agent, or the Bible,
    etc, etc., all are ways of filtering the abstract body of language
    around us for a specific purpose, since they all involve a criterion of
    what is in and what is out of the corpus.
     So, if Francois' purpose is to study proverbs, he could just as well do
    it using a corpus-based methodology (i'm not saying anything about
    whether that is appropriate or not -- it all depends on what his actual
    goals are). And if someone else wants to study the intra-sentential
    behavior of past tense verbs, they might just as well collect a corpus
    of past tense sentences. Btw, recently i have also heard of corpora of
    images, which goes even farther away from the original "collection of
    texts" definition brought up by Paul Hays.

    I would agree with Oliver when he says:

       My understanding of `corpus' is that it is some more or less
       homogeneous collection of utterances, but not `filtered'

    if "homogenous" meant that there is a criterion of selecting what is in
    and what is out; and (in order not to make the above 'definition'
    contradictory) "not `filtered'" meant that no further restriction should
    be imposed on the data, beyond the mentioned selection criterion (as
    Paul mentioned, this is sometimes hard to achieve).

    Have a beautiful day!
    _
     -Lucian Galescu



    This archive was generated by hypermail 2b29 : Fri Jan 28 2000 - 01:48:22 MET