The comparison of languages is of great interest in a theoretical as well as in an applied perspective. It reveals what is general and what is language specific and is therefore important both for the understanding of language in general and for the study of the individual languages compared. The analysis has applications within lexicography, language teaching, and translation studies.
Recently there has been a revival of interest in contrastive studies, partially due to the increasing internationalization of society and the growing need for advanced bilingual and multilingual competence. At the same time, linguistics has become increasingly concerned with the study of language in context, with the emergence of fields like text linguistics, discourse analysis, and pragmatics. The time is ripe for text-based contrastive studies.
Text-based contrastive studies can benefit from the progress in computer processing of texts, which has been a major area of research at the Department of British and American Studies, University of Oslo, and the Norwegian Computing Centre for the Humanities, University of Bergen. The present project extends this work to computer processing of parallel texts.
The aim of the project is (1) to compile a parallel corpus of English and Norwegian texts for computer processing; (2) to develop tools for analysing parallel texts; and (3) to carry out studies of the structure and communicative use of the two languages on the basis of the corpus. Areas to be studied include:
Examples of more general questions to be addressed are: To what extent are there parallel differences in text genres across languages? In what respects do translated texts differ from comparable original texts in the same language? Are there any features in common among translated texts in different languages (and, if so, what are these features)?
The aim of studying translated texts is not to reveal translation mistakes, but rather to use the work of translators as a resource for contrastive analysis and the study of translation problems.
The parallel corpus is planned as an open text bank and will be expanded as allowed by the resources available. It is intended as a general research tool, available beyond the present project for applied and theoretical linguistic research. There will be two main parts:
A core corpus consisting of original texts and their translations (English to Norwegian and Norwegian to English). Initially, the focus has been on novels and fairly general non-fictional books. In order to include material by a range of translators, the texts of the core corpus are limited to text extracts (chunks of 10,000 words or more). Provided that there is sufficient funding, the amount and variety of text will be increased to include more specialized material, including legal texts. The current size of the corpus (November 1997) is approximately 2,6 million words.
A supplementary corpus containing texts that are not translations yet comparable in terms of genre and text type. The supplementary corpus will have the functions of controlling for "translationese" (that is, features typical of translated texts) and, in general, of increasing the amount and variety of the material.
Stig Johansson, Oslo, project leader (language) Knut Hofland, Bergen, project leader (programming) Jarle Ebeling, Oslo, research fellow Signe Oksefjell, Oslo, research assistant
Hilde Hasselgård, Oslo, Kay Wikberg, Oslo.
The project is carried out in cooperation with a research group at the University of Lund (headed by Bengt Altenberg and Karin Aijmer) and with similar research teams in Belgium, Denmark, Finland, and Germany. The Nordic network "Språk i kontrast"/ "Languages in Contrast" is supported by Nordisk Forskerutdanningsakademi. Through the cooperation with other contrastive teams, the study can be extended to multilingual comparison. There are also important gains in corpus compilation.
The material will be used for theses at the M.A. and doctoral levels and for post-doctoral research. One doctoral thesis and several M.A. theses are in progress. Results from the project will be published in the form of articles and eventually in book form.
Hasselgård, Hilde. Forthcoming. 'Some methodological issues in a contrastive study of word order in English and Norwegian'. To appear in B. Altenberg and K. Aijmer (eds), Languages in Contrast. Papers from a Symposium on Text-based Cross-linguistic Studies in Lund, 4-5 March 1994. Hofland, Knut. 1996 'A program for aligning English and Norwegian sentences'. In S. Hockey, N. Ide, and G. Perissinotto (eds.), Research in Humanities Computing 5. Oxford: Oxford University Press. 165-178 Postscript 146 KB Johansson, Stig and Knut Hofland. 1993. 'Towards an English-Norwegian parallel corpus'. In U. Fries, G. Tottie, and P. Schneider (eds), Creating and Using English Language Corpora. Amsterdam: Rodopi: 25-37. Johansson, Stig, Knut Hofland, and Jarle Ebeling. Forthcoming. 'Coding and aligning the English-Norwegian parallel corpus'. To appear in B. Altenberg and K. Aijmer (eds), Languages in Contrast. Papers from a Symposium on Text-based Cross-linguistic Studies in Lund, 4-5 March 1994. Postscript 90 KB Johansson, Stig and Jarle Ebeling. Forthcoming. 'Exploring the English-Norwegian parallel corpus'. To appear in the Proceedings of the Sixteenth ICAME Conference, Toronto, May 1995. Wikberg, Kay. Forthcoming. 'Using the English-Norwegian parallel corpus: Questions in English and Norwegian'. To appear in the Proceedings of the Sixteenth ICAME Conference, Toronto, May 1995.
Department of British and American Studies University of Oslo P.O. Box 1003, Blindern N-0315 OSLO Norway
Norwegian Computing Centre for the Humanities University of Bergen Harald Hårfagres gt. 31 N-5007 BERGEN Norway
World Wide Web: http://gandalf.aksis.uib.no/enpc/
E-mail: Stig.Johansson@iba.uio.no Jarle.Ebeling@iba.uio.no Hilde.Hasselgard@iba.uio.no Knut.Hofland@hd.uib.no Signe.Oksefjell@iba.uio.no K.B.Wikberg@iba.uio.no