Re: Corpora: when does a subcorpus become a corpus?

From: Ute Römer (ute.roemer@uni-koeln.de)
Date: Fri Dec 28 2001 - 17:22:24 MET

  • Next message: P. Kaszubski: "Re: Corpora: when does a subcorpus become a corpus"

    Dear John,

    I think the questions you asked about the representativeness of BNC
    subcorpora are very important!
    You asked:
    > What sort of representativeness do the 4 million or so words of academic
    prose have, once they have been >detached from the larger British National
    Corpus? AND
    > Does this transplanted body of texts become less representative once it is
    withdrawn from the co-text of the >BNC and does it then become an
    opportunistic corpus or a "quick and dirty" collection of texts?
    My answer to the last question would be "Yes, it does" (although I wouldn't
    call the subcorpus a quick and dirty collection of texts). It seems to me
    that a selection of 4 million words of English academic prose is too small
    to be representative of the whole range of EAP. However, the important
    question in this context is "What do you want to do with the (sub)corpus?" A
    4 million word (sub)corpus is probably large enough to carry out research on
    frequent lexico-grammatical phenomena (e.g. if-clauses or modals) but it
    might be too small for studies on less frequently used
    words/lexemes/structures (and especially for non-single word items).
    When they compiled the BNC, the compilers (as you already mentioned) had in
    mind to create a representative collection of contemporary British English
    (and as far as I can tell the BNC IS such a representative sample). What we
    cannot be 100% sure about, however, is whether the compilers were also
    aiming at the creation of a representative collection of EAP (and other
    genres/subgenres) within their corpus. Maybe one of the BNC experts
    subscribed to the list can help?

    All the best from Cologne,
    Ute

    ute.roemer@uni-koeln.de



    This archive was generated by hypermail 2b29 : Fri Dec 28 2001 - 23:07:21 MET