Co-operation between linguists and computational linguists is needed in the updating of both CL and Linguistics curricula. In this respect, ACO*HUM aims towards bringing advanced computing to as large a number of humanities students as possible, rather than merely providing a distinction between those who use advanced computing in the humanities and those who do not.
Core Course/Topic | Contents |
Introduction to & Basics in Linguistics: | Phonetics/Phonology |
Morphology | |
Syntax | |
Semantics | |
Pragmatics | |
Introduction to Grammars | |
Computational Linguistics | Parsing |
Lexical Knowledge | |
Formal Semantics | |
Pragmatics | |
Applications | |
Symbolic Computation | Lisp and/or Prolog |
a procedural language | |
Program Design | |
Data Structures and Algorithms | |
Mathematics/Logic | FOPC |
Set Theory | |
Formal Languages and Automata | |
Probability Theory | |
Grammar Formalisms | Constraint Grammars |
GB | |
other state of the art grammars | |
AI | Search |
Reasoning and Theorem Proving | |
Knowledge Representation | |
Learning |
Table 4.1: Core Courses in NLP
These proposals correspond remarkably well to the results of the survey of March 1999. Regarding core topics, the responding institutions seemed to agree on Parsing Algorithms and Formal Grammars; Introductory courses on Linguistics; Lexical knowledge, Formal semantics, Mathematics and Logic.
It was envisaged that these courses could be taken by students from any site at any other site. As these courses could be embedded in different types of curricula (e.g. languages, linguistics, computer science, psychology etc.), the exact number of courses and the place in the curriculum where they are taught would differ from site to site. It is interesting to note that is this work were to be attempted now, then in all probability (no pun intended!) a separate strand on Statistical NLP would be advocated. Nevertheless, at the time this core material was proposed, this was still very much an emerging discipline, and it was not clear whether it would stand the test of time, and was therefore excluded.Of course, as well as these core components, there are a number of specialization modules, the whole being summarized as follows (x indicates "is taught"; - indicates "is not taught"; o indicates "is optional"):
Course | Sheffield | DCU | Saarbrücken | Tilburg | UMIST |
---|---|---|---|---|---|
Introduction to Linguistics | x | x | x | x | x |
Prolog | x | x | x | x | x |
Math/Logic | x | x | x | x | x |
Procedural Programming | x | x | x | o | x |
Logic/Semantics | x | x | o | x | o |
Software Engineering | x | x | - | Project | - |
Algorithms/Data Structures | x | x | x | - | - |
Grammar | x | - | x | x | x |
Introduction to CL | x | x | x | x | x |
LISP | x | - | o | x | x |
Artificial Intelligence | x | x | o | x | x |
Parsing | x | x | o | x | x |
Grammar Formalisms | x | x | x | - | x |
Pragmatics | x | - | x | - | x |
Formal Semantics | x | x | x | x | - |
Vision/Robotics | x | x | - | - | - |
Complexity Theory | x | x | x | - | - |
Philosophy | x | x | - | - | - |
Corpus Analysis/ Empirical Methods | x | x | x | o | x |
Psycholinguistics | x | x | o | x | x |
Sociolinguistics | x | - | - | x | x |
Acoustics/Phonetics | x | x | x | - | x |
Speech | x | x | x | - | x |
Machine Translation | - | x | x | - | x |
Info. Retrieval | - | - | - | x | - |
Human Computer Interaction | x | - | - | x | - |
Table 4.2: Specializations taught at selected sites teaching NLP
These specialization modules obviously reflect the different expertise of different sites, and under our schema would ultimately make possible a wider choice of specialization to all students in the ICP. Other modules which could be included are language-dependent NLP (French, Irish, Dutch, German NLP) and other application-specific NLP (Dialogue systems, Software localization etc.).
The properties of the programme can be summarized as follows:
Topic | Gøteborg | DCU | Saarbrücken | Sheffield | UMIST | Trondheim |
---|---|---|---|---|---|---|
Ling/Grammar |
40
|
45
|
45
|
15
|
45
|
55
|
Math/Logic |
15
|
10
|
20
|
15
|
10
|
5
|
Computing |
45
|
40
|
30
|
45
|
35
|
15
|
CL |
20
|
15
|
25
|
15
|
15
|
45
|
AI |
0
|
10
|
0
|
30
|
15
|
0
|
Table 4.3: Proportion of material taught at selected sites in terms of ECTS
The sums add to 120, as this core material is envisaged taking the equivalent of two years' study, and ECTS allocates a maximum of 60 credits per year's study. Even assuming such a general classification schema as this, one can see the very different emphasis placed on certain topics in different institutions. There may, of course, be philosophical, geographical or historical issues underpinning such choices, but much more compelling is that it reflects the skills and interests of the staff at the sites concerned.
The final phase of this integration under the aegis of the ICP has already been completed by all partners. Progress was hampered as before by the many bureaucratic and administrative problems when trying to adapt the different curricula at the different institutions to incorporate the basic module. While we remain committed in principle to developing a joint curriculum in NLP, getting each university to adopt our model will be a much greater obstacle to progress, particularly when it comes to the establishment of treaties between universities for mutual recognition. However, such an understanding may be more readily achievable at postgraduate level, hence our involvement in the development of a European Msc in Language and Speech.
As for a European dimension in our language-specific modules, this can be taken almost as a given. Obviously parsing or speech processing is taught in (say) UMIST or DCU with English in mind, whilst in Saarbrücken the focus is on German. Likewise, when machine translation is taught, the focus would be on translation into the native tongue. Our efforts regarding Open and Distance Learning (ODL) techniques are documented elsewhere in this chapter.
Obviously we hoped that our work on curriculum development would be used by other universities setting up proposed NLP programmes, and indeed members of the ICP have been contacted by other parties interested in bringing in new courses in NLP, following publicity given to this via the WWW and papers at conferences. In addition, with the work under the ICP having come to a close, we have continued this groundwork under the SOCRATES networks ACO*HUM and Speech Communication Sciences, or in further joint efforts with other interested groups.
One thing which we wanted to do was to broaden the user base so as to reflect more exactly the state of teaching in CL throughout Europe. The ICP in NLP contained just 18 universities. The number of institutions within ACO*HUM expressing an interest in CL is 52, so even within our network there is much to be learned from these new partners. Nevertheless, there are a number of excellent CL sites which are not in ACO*HUM, whose views needed to be embraced. This was one of the reasons the CL group within ACO*HUM conducted a survey of major universities with expertise in CL, some of the results of which are discussed elsewhere in this chapter.
Nevertheless, our efforts are genuinely oriented at the definition of best practice in defining commonly agreed curricula, at least at the level of common core material. We feel that this work is innovative and hopefully of use to the wider NLP community and that it will offer advantages for mobility. Here, for instance, certain institutions could modify their curricula with respect to courses offered by other institutions if they became aware of certain omissions in their curricula, should they feel that this lacking of certain knowledge hinders the progression of their students. We feel that mobility of students and staff would be greatly facilitated by adopting a common starting point as entry for different specializations at different places. Thus, a certain harmonization at basic levels will promote diversity at more advanced levels.
In addition, knowing what students can be expected to have learned prior to their coming to one's university from abroad, can help lecturers anticipate any particular problems, and provide invaluable information when it comes to selecting an appropriate course of study abroad. Also, we envisage that a core curriculum could serve as a baseline, or model of good practice, for comparison by other universities who are hoping to set up a course in CL, or who may want to bring their courses more into line with what is being taught at those universities at the forefront of NLP research (particular in the Eastern European context). Finally, an international agreement on common core elements in CL curricula could simplify the match between job profiles and competencies; this could be a significant advantage for the language industries when recruiting from many countries to cover a wide group of languages.
Notwithstanding these benefits, we want to head off right now one particular criticism which can be anticipated. That is, we want to stress that our proposals in favour of a common core curriculum are entirely voluntary, and should not be considered an effort to standardize European education in the least. The adoption of good practice in commonly agreed curricula should spring from the university's desire to offer advantages to students and should not be imposed from above. We hope it is clear that we want to maintain the specializations which exist, but rather than homogenize all NLP teaching, we want instead to make such specializations available to more students, so as to widen the choice of material open to them, and in so doing enhance their learning experience. If successful, this will provide still more diversity, lead to more student and staff mobility than currently exists, and make mobility more fruitful.
However, the field of CL is developing very rapidly, and therefore the definition of best practice in CL curricula is a moving target which will require continuing co-ordinated efforts in the future. Most of all, the field is becoming even more interdisciplinary than before; increasingly, new programmes are across faculty boundaries. We refer in this respect to the programme Multimedia for Knowledge Transfer at the University of Leiden, The Netherlands, which offers an interdisciplinary major to students of Psychology, Computer Science, Linguistics, Art History and Education Sciences. The programme covers multimedia (graphic design, sound, text, video, animation) in an integrated and useful manner to present and register information and knowledge. Natural Language Processing makes up 3 of the 14 minimal core credits in this nominally one-year programme including a project.
Similarly, the masters degree in Intelligent MultiMedia at Aalborg University in Denmark focusses not only on text and speech but also on vision and their mutual integration. The goal is to implement a one and a half year masters where students will be required to spend at least three months at another institution in another European country. The idea is also that students will be able to avail of expertise at another institution which may not exist at their own. CL is represented in this Master's through three content descriptions: theoretical linguistics, natural language processing, and language engineering applications. Group project work is an essential learning mode throughout the programme.
We support the work done under the CDA for developing the curriculum for a pan-European Master's degree course in language and speech, which will be taught from October 1999. This project, which is detailed further in Bloothooft (1999b) and Bloothooft et al. (1998b), have a somewhat wider scope than CL. Their aim to integrate speech processing with natural language processing seems entirely appropriate at this point, given developments in society, especially in the language industries. Such integration is also supported by the results of the survey.
Nevertheless, we reiterate here that any proposals here remain just that: we do not advocate a bland, homogeneous methodology throughout the continent; rather, we hope, provided such common groundwork is in place, that students will be able to avail of the rich number of specialization modules available in a more mobile structure than is currently available. This will lead to a more diverse workforce, more cross-cultural communication, and greater understanding of our partners throughout Europe than is presently the case. This can only be encouraged, and sought after ever more vigorously. Finally, we feel that all courses should conform to the practice of ECTS to enable compatibility and facilitate comparisons to be drawn between courses.
In sum, we recommend the following: