Re: Corpora: Student needs info

From: Eric Atwell (eric@comp.leeds.ac.uk)
Date: Mon May 27 2002 - 13:43:44 MET DST

  • Next message: PALC 2003: "Corpora: First Circular and Call for Papers"

    Rodrigo,
    Please could you summarise any replies you get and post this summary
    back to the CORPORA list - this may be useful to others building
    corpora, including students here at Leeds University.

    I suggest one place to start is ICAME, the International Computer
    Archive of Modern and medieval English, host of the CORPORA mailing list
    and of the ICAME website http://www.hd.uib.no/icame.html

    Info on the website which might help you includes Manuals for the corpora
    distributed by ICAME; most include background info on how the corpora were
    collected and tagged etc: http://khnt.hit.uib.no/icame/manuals/index.htm

    ICAME also publishes ICAME Journal, with back issues online on the website;
    ICAME Journal includes papers relevant to corpus building and tagging, you
    could start with paper(s) on the language genre(s) you are interested in, eg:

    Alejandro Curado Fuentes, "Exploitation and assessment of a Business English
    corpus through language learning tasks", ICAME Journal Vol.26 pp5-32, 2002

    Norma Pravec, "Survey of learner corpora", ICAME Journal Vol.26 pp81-114, 2002

    Ma Dolores Ramirez Verdugo, "Non-native interlanguage intonation
    systems: a study based on a computerised corpus of Spanish learners of
    English", ICAME Journal Vol.26 pp115-132, 2002

    Claudia Claridge, "Causal Clauses in written and speech-related genres
    in Early Modern English", ICAME Journal Vol.25 pp31-64, 2001

    Eric Atwell, George Demetriou, John Hughes, Amanda Schiffrin, Clive
    Souter and Sean Wilcock, "A comparative evaluation of modern English
    corpus grammatical annotation schemes", ICAME Journal Vol.24 pp7-24, 2000

    Merja Kyt÷, Juhani Rudanko and Erik Smitterberg, "Building a bridge
    between the present and the past: A corpus of 19th-century English",
    ICAME Journal Vol.24 pp85-98, 2000

    Winnie Cheng and Martin Warren, "Facilitating a description of
    intercultural conversations: the Hong Kong Corpus of Conversational English"
    ICAME Journal Vol.23 pp5-20, 1999

    Manfred Markus, "Getting to grips with chips and Early Middle
    English text variants: sampling Ancrene Riwle and Hali Meidenhad",
    ICAME Journal Vol.23 pp35-52, 1999

    Arja Nurmi, "The Corpus of Early English Correspondence Sampler (CEECS)",
    ICAME Journal Vol.23 pp53-64, 1999

    Tobias Rademann, "Using online electronic newspapers in modern English-language
    press corpora: Benefits and pitfalls", ICAME Journal Vol.22 pp49-72, 1998

    Minna Vihla, "Medicor: A corpus of contemporary American medical texts",
    ICAME Journal Vol.22 pp73-80, 1998

    Rainer Siemund and Claudia Claridge, "The Lampeter Corpus of Early Modern
    English Tracts", ICAME Journal Vol.21 pp61-70, 1997

    Gregory John Watson, "The Finnish-Australian English Corpus",
    ICAME Journal Vol.20, pp41-70, 1996

    Anneli Meurman-Solin, "A new tool: The Helsinki Corpus of Older Scots
    (1450-1700)", ICAME Journal Vol.19, pp49-62, 1995

    Roger Garside, "The marking of cohesive relationships: tools for the
    construction of a large bank of anaphoric data",
    ICAME Journal Vol.17 pp5-28, 1993

    Merja Kyt÷ and Matti Rissanen, "A language in transition: the Helsinki
    corpus of English texts", ICAME Journal Vol.16, pp7-26, 1992

    Elizabeth Green and Pam Peters, "The Australian Corpus project and
    Australian English", ICAME Journal Vol.15 pp.37-54, 1991

    Brian MacWhinney and Catherin Snow, "The Child Language Data Exchange
    System CHILDES", ICAME Journal Vol.14 pp.3-25, 1990

    Louis Milic, "A new historical corpus", ICAME Journal Vol.14, pp.26-39, 1990

    Sidney Greenbaum, "The International Corpus of English",
    ICAME Journal Vol.14 pp.106-108, 1990

    Clive Souter, "The COMMUNAL project: extracting a grammar from the
    Polytechnic of Wales Corpus", ICAME Journal Vol.13, pp.20-27, 1989

    Nelleke Oostdijk, "A corpus for studying linguistic variation",
    ICAME Journal Vol.12, pp3-14, 1988

    Marion Owen, "Evaluating automatic grammatical tagging of text",
    ICAME Journal Vol.11 pp.18-26, 1987

    Pam Peters, "Towards a corpus of Australian English",
    ICAME Journal Vol.11 pp.27-38, 1987

    K Ahmad and G Corbett, "The Melbourne-Surrey Corpus",
    ICAME Journal Vol.11 pp.39-43, 1987

    Charles Meyer, "Punctuation practice in the Brown Corpus"
    ICAME Journal Vol.10, pp.80-95, 1986.

    Barbara Booth, "Revising CLAWS", ICAME Journal Vol.9 pp.29-35, 1985

    Geoffrey Leech, Roger Garside and Eric Atwell, "The Automatic Grammatical
    Tagging of the LOB Corpus", ICAME Journal Vol.7 pp.13-33, 1983

    J M Gill, "The Gill Corpus", ICAME Journal Vol. 4 pp.7-8, 1980

    Louis Milic, "The Augustan Prose Sample and the Century of Prose Corpus",
    ICAME Journal Vol.4, pp.11-12, 1980

    ICAME Journal also includes reviews and abstracts of books and other
    publications relevant to corpus building and annotation, as "pointers"
    to the wider research literature. However, NOTE that some of the
    earlier papers cited above pre-date Windows-XP so the software may not
    be readily re-usable on today's Windows-based PCs :)

    Last by DEFINITELY not least, I recommend the searchable ICAME
    bibliography database recently put online by Knut Hofland:

    http://korpus.hit.uib.no/icame/bib_search.html

    I hope this helps

    Eric Atwell

    --
    Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
    School of Computing, University of Leeds, LEEDS LS2 9JT
    TEL: 0113-2335430  MOBILE: 0775-1039104 FAX: 0113-2335468
    WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric@comp.leeds.ac.uk
    --
    

    On Sun, 26 May 2002, Rodrigo Tadeu Gonšalves wrote:

    > Hi people, > > I'm looking for basic bibliography on corpus building, preferentially online > materials (so far I have only good intentions and no knowledge) and > Windows-based software for tagging and corpus building. > > Thanks in advance, > > Rodrigo T. Gonšalves > > > >



    This archive was generated by hypermail 2b29 : Mon May 27 2002 - 14:03:53 MET DST