[Corpora-List] RE: [Corpora-List] Parallel corpora and word alignment, WAS: American and British English spelling converter

From: Santos Diana (Diana.Santos@sintef.no)
Date: Thu Nov 16 2006 - 15:19:59 MET

  • Next message: Santos Diana: "[Corpora-List] RE: [Corpora-List] Parallel corpora and word alignment, WAS: American and British English spelling converter"

    > >
    > > But it intrigued me to think of parallel corpora
    > *within* a language.
    > > I suppose dialectal texts rendered into "standard" language or
    > > vice versa
    > > might come close... I need to muse some more on this.
    > >

    Well, current parallel corpora -- even just bilingual -- may have different varieties of one language, which makes them also parallel corpora inside one language. For example, COMPARA, www.linguateca.pt/COMPARA/ has some translations of the same original texts into different varieties (of English, and of Portuguese).

    I am sure that there are translation corpora (in the sense of having been compiled for the specific purpose of studying the translation process or result) that have multiple translations into the very same language -- and may feature different varieties. This might be a place to look into.

    In any case, the point that a well designed (bilingual) parallel corpus can be used also as
    - two monolingual corpora
    - two corpora of several varieties
    - two comparable corpora
    - two translation corpora
    and so on...
    has long been made by Stig Johansson when presenting the ENPC in the 90s. See e.g.

    Johansson, Stig. "On the role of corpora in cross-linguistic research", in S. Johansson and S. Oksefjell (eds.), Corpora and cross-linguistic research: theory, method, and case studies, Amsterdam: Rodopi, pp.3-24.

    It depends on the goal of your studies, of course, how much text you require for comparing varieties, but there are some corpora at least already with the potential for that. You have anyway to be careful that not all differences (for example in independently created translations) are due simply to differences in variety: after all, two different translators are two different creators, but some of the differences may be related to the variety they speak/write.

    Best,
    Diana
    ---------------
    Diana Santos
    www.linguateca.pt
    Pólo de Oslo da Linguateca, SINTEF ICT
    Pb 124 Blindern, N-0314 Oslo, Noruega



    This archive was generated by hypermail 2b29 : Thu Nov 16 2006 - 15:17:01 MET