Corpora: Summary of academic corpora responses

From: Paul Thompson (
Date: Mon Jun 10 2002 - 14:39:19 MET DST

  • Next message: Janice McAlpine: "Corpora: type to lemma ratio"

    Nearly a month ago I posted a set of questions to this list about the
    uses of corpora in research into academic discourse. Many thanks to all
    those who took the time to contact me with information, and comments - I
    really appreciate the help!

    Here is a summary of the responses:

    Lynne Flowerdew referred me to her excellent review article,
    "Corpus-based analyses in EAP" in J. Flowerdew (Ed.) Academic Discourse,
    pp. 95-114. London: Longman (2002).

    Beatriz Méndez built a corpus of 8 articles of a High Impact Factor
    journal in the field of Radiology for her PhD thesis and used WordSmith
    Tools to analyse the corpus. She investigated combinatorial patterns to
    see how discourse is organised in medical articles.

    John Swales is working on a new book, "Research genres--explorations and
    applications" which makes use of Ken Hyland's 80-article corpus (cf, Ken
    Hyland's book "Disciplinary Discourses") and of MICASE (the Michigan
    Corpus of Spoken Academic English
    He also used these two corpora in the writing of the second edition of
    "Academic writing for graduate students" (co-authored with Chris Feak).

    Andreas Eriksson pointed my attention to studies of tense and aspect in
    academic discourse, recommending an MA dissertation by Li Vinh Taylor
    (online citation:
    as a good overview of studies in this field.

    Monica Hill and Annie Mueller wrote about the 'Vocabulary for specific
    disciplines' project that they are involved in, at the English Centre,
    University of Hong Kong. Monica Hill is the principle grant holder for
    the project.

    Monica wrote:

    "Some colleagues and I have been working on a vocabulary project to help
    tertiary level students work on discipline specific vocabulary. We are
    investigating Law, Social Work, Business/Economics, Engineering and
    Medicine. Each of us has developed our own corpus by scanning in the
    text books the students use in Year 1 at university, relevant academic
    articles from the university press, and some general newspaper/magazine
    articles - so we have a variety of genres. Each of us has about 500,000
    words in our corpus, so it certainly isn't exhaustive, but by putting
    the corpus through a word frequency analyser, it provides us with the
    basic words the students need to know for that discipline ...

    ... The word frequency analysis is based on Nation and Laufer's Lexical
    Frequency Profile, details of which are on Nation's very informative
    site at Victoria University, Wellington. The academic
    words are from Coxhead's Academic Word List (also at LALS, Wellington).

    Using the profiler, we can highlight the different levels of words that
    we want to investigate. We can compare word frequencies across texts,
    search for the first 1,000 most frequent words, or second thousand,
    academic words and 'off-list' words which are those that have not
    already been included in the other lists. From this last group of
    words, we then identify which words are most relevant to our students'
    needs and we are developing a text based vocabulary learning programme
    containing exercises to help students use the words appropriately."

    Annie is looking at the "words 'engineers need to talk to each other'"
    and has so far compiled a corpus of approx 300,000 words using text
    books and journals given her, or recommended by lecurers in the faculty,
    and several issues of an on-line journal. The corpus will be used to
    give ideas about the words in use in an engineering context.

    David Oakey wrote about his research work on lexical phrases in academic
    writing (1998-2001) using the MicroConcord B excerpts of "academic"
    prose available with MicroConcord, and the "academic writing" part of
    the BNC v1.0 based on David Lee's categorisations. This work has been
    written up for publication in the following 2 book chapters:

      * "A corpus-based study of the formal and functional variation of a
    lexical phrase in different academic disciplines in English." in Reppen,
    R., Biber, D., and Fitzmaurice, S. (forthcoming) (Eds.) Using Corpora to
    Explore Linguistic Variation. New York: John Benjamins

      * "Lexical phrases for teaching academic writing in English: corpus
    evidence." in Nuccorini, S. (forthcoming) (Ed.) ISP4 Proceedings (Title
    not yet known). Peter Lang

    Aquilino Sánchez sent information on the CUMBRE corpus, a 20 m. word
    corpus of contemporary Spanish (up till 1996), not specifically an
    academic corpus. Information can be found at:


    Original posting:

    >>I am preparing to write a review article on the uses that have been
    made of corpora in the study of academic discourse, such as in :
    >>* research into the vocabulary or grammar of academic discourse
    >>* rhetorical or discourse analysis of academic discourse
    >>* the preparation of teaching materials for Language for Specific
    Purposes (for example, EAP) courses
    >>* the provision of data for students to investigate either in language
    learning courses or in language study courses
    >>* study of discourse varieties, or in cross-linguistic comparisons
    >>I’d like to ask members of the corpora list who have used corpora for
    any of the above purposes (or know of others who have) the following
    >>What corpus/corpora did you work with? How was the corpus compiled?
    What format is/was it in? Is it publicly available?
    >>How was the corpus analyzed / investigated? How many people,
    approximately, have used the corpus (if it is possible to put a figure
    on this)?
    >>When was the research / teaching done, and were there any end
    products, such as software, books, journal articles?

    Dr Paul Thompson
    School of Linguistics and Applied Language Studies
    Language Resource Centre
    P. O. Box 241
    The University of Reading
    Reading RG6 6WB
    Tel: 44 118 9316472

    This archive was generated by hypermail 2b29 : Mon Jun 10 2002 - 15:01:03 MET DST