Corpora: corpus of French academic texts

From: Tine Greidanus (
Date: Thu Jan 20 2000 - 13:34:51 MET

    Dear listmember,

    I would like to make a frequency list of French academic words
    comparable to the Academic Word List (of English words) by Averil
    Coxhead (Victoria University of Wellington, New Zealand). This list
    consists of around 600 words (= word families) that are reasonably
    frequent in a wide range of academic texts (words like assume, achieve,
    concept, for example). These academic words are common in academic
    texts, but not so common elsewhere. Several studies showed that these
    words are generally not as well known as technical vocabulary. An
    Academic Word List is thus very useful for university students doing
    their studies in a second language.

    There are, to my knowledge, no recent frequency lists of French words,
    made for pedagogical reasons, and even less a list corresponding to the
    Coxhead list. There is a list made by the Cr‚dif and published in 1971,
    the Vocabulaire g‚n‚ral d'orientation scientifique. It covers only the
    scientific fields (mathematics, physics, natural sciences). I intend to
    make a comparable list for the most important academic areas.

    My problem is the compilation of the corpus. The 'Institut de la Lange
    Fran‡aise' (INALF) has an enormous corpus, called Frantext. It consists
    mainly of literary texts of the nineteenth and twentieth century, but
    there is also a subcorpus of 'textes scientifiques et techniques'. These
    texts are however rather old and rather biased, and thus not suitable
    for my purpose. So I have to find something else.

    The Coxhead list is based on a corpus of 3.500.000 running words. This
    corpus was divided into four 'faculty sections': Arts, Science, Law,
    Commerce. Each faculty section was divided into seven subject areas. The
    texts were journal articles, book chapters, course workbooks, laboratory
    manuals and course notes, and were representative of the academic genre.

    As for myself, I think of choosing (parts of) textbooks for the French
    DEUG program (the first two years of university). A corpus of 3.500.000
    words is equal to approximately 10.000 pages. I could purchase the
    textbooks and scan the pages, but this amounts to a lot of work I'd
    rather avoid. So I would be grateful if anyone could give me tips
    concerning existing electronic texts or corpora I could use. Does anyone
    have experience with publishing companies of university textbooks
    putting electronic versions of books they publish at the disposal of
    researchers? Are there difficulties to foresee as regards copy rights?
    All information, ideas, tips, etc. are very wellcome!

    Tine Greidanus
    Vrije Universiteit
    Faculteit der Letteren
    De Boelelaan 1105
    1081 HV Amsterdam
    tel. 31 20 44 46 460

