4.2 Computational linguistics in the European educational landscape

In order to assess the way CL courses and programmes currently are situated in the European educational landscape, a survey was conducted by ACO*HUM in March 1999.  The objectives of this survey consisted mostly of basic fact gathering, but to a certain degree, they also included an inquiry into teaching staff opinions.  An invitation to fill out the questionnaire was sent out to approximately one hundred persons at European departments teaching CL, as well as to several e-mail distribution lists.  One member per department was requested to fill out the questionnaire.  Of 68 received answers, 63 were complete and without error.  The results are presented in an appendix to this chapter.  The remainder of this section is devoted to comments and conclusions drawn from this survey.

Before commenting on these survey results, a caveat must be given regarding coverage.  The distribution of answers among countries does not necessarily reflect the actual distribution of institutions among countries.  Our survey was sent to as many places as possible to reach a wide international coverage.  The majority of responses come from Spain and the United Kingdom, followed by France and Germany.  Although this seems to correspond roughly with intuitions regarding size, we have no guarantee that the sampling reflected in the answers is representative.  Also, from three of the universities, two different persons each sent in answers for the same institution.  We chose not to discard any answers.  The results should therefore be interpreted with care: they come from 63 staff members from 60 universities in total.

In the first place, the results of the survey reflect the current status of the field in education as a highly specialized subject in most educational institutions where it is taught as a degree only at the masters level in most of the responding institutions, whereas it is mostly offered as a specialization course in another degree scheme at undergraduate level.  Respondents have pointed out that this question about degrees missed the option not taught at all.  Apparently, the questionnaire's assumption that CL, if taught, is taught at all levels, is not true.  The results, which must therefore be interpreted with care, still seem to indicate that, on the whole, the impact of CL in European curricula is mostly restricted to the postgraduate level.  This may well be due to the administrative organization of the institutions: while a majority of respondents were from Computer Science departments, they are mostly located in Humanities faculties, where the field might be viewed as a specialization within other subjects.

Overall, there is roughly a balance between humanities and non-humanities in the answers.  This illustrates the strong interdisciplinary nature of the field.  The results are, however, widely different for different countries.  For Ireland, for example, all answers indicate that CL is taught at Computer Science departments, whereas the answers for Norway indicate that CL is taught only in Linguistics departments at humanities faculties.  This suggests that any international cooperation in the CL field must not be fettered by faculty boundaries.

With respect to the teaching staff dedicated to CL, the survey confirmed previous data gathered from the 18 institutions participating in the ERASMUS Inter-University Cooperation Programme (ICP) in Natural Language Processing, according to which the number of CL staff at most universities is rather small (1.8 per university, on average).  By contrast, the yearly number of new students starting on CL as a full degree or specialization is significantly higher (12.6, on average), about 40 % of which are women.  At the same time, the result needs to be interpreted with care, because they may include null results for sites not offering CL degrees or specializations at all, just isolated courses.

Only a small percentage of the enrolled students are taking part of their education abroad (3.2 %), and the number of visiting/exchange students from abroad is also very small (3.1 %).  This might probably be due to the lack of harmonization of curricula, but also to administrative problems in the recognition of courses among European institutions.  We recommend a continuation of the work begun in the ICP to foster student mobility in Europe.  Both industry, where most students go to after finishing their degree (63.1 %), and academia (32.1 %), will benefit from this type of initiative.

The answers regarding core and application-oriented topics are relevant for the evaluation of curriculum development (see section 4.3).  Concerning the core topics preferred by the responding institutions, there seems to be an agreement on the teaching of core topics such as Parsing Algorithms and Formal Grammars, together with the teaching of introductory courses on Linguistics, followed by Lexical knowledge, Formal semantics, Mathematics and Logic.  Pragmatic techniques are gaining ground followed by areas such as Corpus Linguistics, Natural Language Generation and Statistical Methods.  Other subjects such as Language Pedagogy, Foreign languages, Quantitative Linguistics, or Connectionist Computing, to mention the ones provided by responding universities are still in a minority at this level.  All this ties in remarkably well with the work pioneered by the ICP in NLP, indicating a real consensus across Europe on what constitutes core material.  The answers also indicate that speech is not negligible in CL.  This suggests that close cooperation must be established with international actions in speech communication, including the SOCRATES thematic network project on Speech Communication Sciences.

With respect to applications, Machine Translation, NL Interfaces and Information Retrieval are at the forefront, followed by areas such as Speech Technology, and Computer-assisted Language Learning.  Other related applications such as Computer-aided Translation, Speech and Image processing, or Information Design are only taught at a small number of universities.  There seems to be a general consensus, therefore, as to what must be the core topics to be taught, independent of whether the course is offered at undergraduate or postgraduate level.  We are unable to comment, however, on the teaching hierarchies at different sites, i.e. whether the teaching of one of the areas without knowledge of the other is precluded or not (via linked modules, or prerequisites), as the survey does not provide such information.

Regarding delivery of teaching, it can be concluded that lectures are still the dominant teaching method, but individual and group projects together (37.4 %) make up a substantial practical component.  Even though lecture attendance is listed as the dominant activity, the various forms of exercise, taken together, are equally important.  Web-based courses are still not widespread, but are gaining ground as complementary teaching aids in many European institutions.  Indeed, this is one of the innovations which is sought after by a significant proportion of the respondents in the near future.  The proportion of learning activities is still dominated by listening to lectures, although writing programs takes up half of the total amount of activities, followed by using ready-made computer tools and doing exercises on paper.  These proportions possibly reflect the distinction between students of linguistics, as users of computer tools, and students of computational linguistics, who, in addition, learn to develop such tools.  The writing of computer programs is only seen as part of the students' learning activities when the course is part of a full degree in CL, but it is not a requirement when it is an option, or specialization in another degree or subject.

While there is a surprisingly wide range of different textbooks in use, the need for a better textbook is also expressed.  As to the main programming languages used in teaching, there seems to be substantially more variety than before.  Prolog is, by some distance, the preferred one, followed by newcomer Java, then C and C++, and Lisp.  Others such as Perl and Pascal appear lower down still, followed by Elu, Visual Basic and Ada, each of which is taught in only one of the participating institutions.  Even if Windows is ranked as the most important platform, it is actually surpassed by the combination of Unix and Linux.

The main resources and tools used in most universities are corpora.  This is followed by more specialized tools such as Wordnet, PC-Kimmo and the LFG Grammar Writer's Workbench, all used for different purposes.  The role of computer corpora is, therefore, gaining further ground as the main multi-purpose, widespread resource in the field.

With respect to employment, the results indicate that the opportunities in industry are twice as big as academia. should be interpreted carefully since the categories are coarse.  The public sector is not listed as such.  Still, the results suggests that industry needs should not be ignored in the planning of education programmes in CL, especially since the output of graduates at present seems to remain below the demand in language research and the language industries.  This suggests that measures should be taken for recruiting more students.

In keeping with the percentage of responding institutions, Spanish and English appear as the main languages of instruction, followed at a distance by German, French, Dutch, Swedish, Norwegian and Italian.  Portuguese, Finnish and Czech are only used by the corresponding participating university which responded to the questionnaire.  For the same reason, English and Spanish are the preferred languages used as the target of processing, followed by French and Italian.  Interestingly, no respondent stated that they used German as the target language, while other languages such as Irish, Japanese, Russian, Polish, Catalan, Greek, Bulgarian, Slovenian, Chinese, Turkish, Galician, Basque and Hindi are used, independent of the language of instruction.  In fact, twice as many languages were listed as targets of processing than as the main languages of instruction.  This suggests that multilinguality is an important issue in CL.  Since the processed languages include several non-European ones, attention must be given to how the special needs for computer processing of these languages can be supported.  We refer to chapter 5 on non-European languages.

The mobility of CL students is indicated by the finding that about 6% of the CL student population is abroad at any time. 43% of the programmes require an obligatory placement or traineeship.  Combined with the mobility and multilinguality figures, this indicates that it would be welcome to have an international database of placement opportunities.

Summing up, CL is a small but important, highly interdisciplinary field showing a lot of diversity and in need of international support and coordination.  The way in which CL is institutionalized in higher education across Europe should be approached with an eye for its historical background.  Linguistics was probably the earliest humanities discipline to adopt a computational approach, such that CL was already an established discipline in its own right before the wider area of humanities computing became established.  Moreover, CL has always been a truly multidisciplinary endeavor, not just a development within the humanities.  Thus, a high participation of non-humanities departments is to be expected.  There is great diversity in Europe with respect to embedding and approaches.  Even if this diversity is an asset, it is also in need of international dialogue and intensified cooperation in the face of its present and future challenges: too low student numbers, the industry's urgent need for competence, even more interdisciplinarity (speech; new programming languages), increasing multilinguality, and student mobility.