Chapter 3
Anthea Ballam
King's College, London
Donald Broady
Royal Institute of Technology/Nada, Stockholm
Lou Burnard
Oxford University
Elisabeth Burr
University of Duisburg
Stuart Lee
Oxford University
Lisa Lena Opas
University of Joensuu
Thomas Rommel
University of Tübingen
Although his pioneering work aroused some interest during the following decade, it was in the later 1960s and early 1970s that a recognizably European dimension to the application of computing in the textual disciplines began to emerge, with significant work being done in a number of universities across Europe, notably in Italy (e.g. Pisa and Rome), Germany (e.g. Tübingen), France, the Scandinavian countries and the UK (Cambridge, Oxford and King's College London).
This new momentum and the developing sense of a community of scholars interested in exploiting the new techniques was marked by the founding of the Association for Literary and Linguistic Computing in 1973. However, it would be true to say that in this period the main emphasis in the application of computing was on research rather than teaching. Starting in the 1970s, Europe saw the gradual concentration of this research in institutional and national research centres, e.g. by the establishment of a national Norwegian Centre for Computing in the Humanities at Bergen in 1972.
During the 1980s and the early 1990s, the possibilities of the new technologies became more widely recognised, particularly when the significance of the personal computer revolution started to become apparent. It was in this period that the potential for computing to transform teaching and learning as well as research in the text-based disciplines began to receive attention, and courses and programmes formally involving the application of computing were initiated in a number of European institutions (as discussed in more detail in Section 3.3 below). In some countries this was accompanied by national initiatives, such as the Computers in Teaching Initiative in the UK. Although this initiative was broadly based, it involved the setting up of two centres relevant to the text-based disciplines - the CTI Centre for Textual Studies at Oxford, and the CTI Centre for Modern Languages at Hull.
The role of computing in learning and teaching is now widely understood, not only in terms of producing new generations of researchers and university teachers trained in the application of formal methods and able to apply the new technologies to their own work, but also in ensuring a substantial majority of European citizens and workers who understand computing, who are technically adept as they enter the work force, and who are equipped to contribute to political and cultural life of their society within a wider international context that is increasingly dominated by technology.
One natural development from the early work has been the increasingly sophisticated application of statistical methods. These are the cornerstone of stylometric studies in general, and of authorship attribution research in particular. A number of statistical packages are used, all developed for more overtly 'statistical' areas of activity. Another example of techniques and software developed for other disciplines playing a role in literary studies is the use of cladistic analysis, from the biological sciences, in tracing the 'genealogies' of manuscripts.
Text mark-up is another major area of activity which grew out of the early work. Mark-up can be used for something as basic as recording the structure and appearance of a source text, but is now widely used for more sophisticated analytical approaches, not only by encoding parts of speech to assist study of an author's use of language, but by making semantic or other encodings to enable more overtly interpretative analyses. The work of the Text Encoding Initiative (TEI) - funded in part by the European Commission - has played a significant role in the development of this work, ensuring that efforts that go into mark-up can be re-used by other scholars and preserved for the future. By adopting SGML as the basis for its recommendations, the TEI ensured that scholars would be able to take advantage of software tools developed for the commercial world.
With the development of XML as a new text encoding standard, the significance of the TEI for textual scholars is likely to increase further. It is also true that the analytical needs of scholars are by no means identical to the commercial objectives of publishers and others, so that there is likely always to be a need for the development of software that is specific to research and teaching. At the same time, there is likely in turn to be potential for the wider application of such software and analytical methods in the commerical world in due course.
The dramatic reductions in digital storage costs has made it possible for new approaches to be followed in scholarly editing. It is now feasible to include transcriptions of all known sources for a particular work, along with images of the source manuscripts. Using mark-up, the editors can highlight the variations between sources.
Low storage costs have also made possible the increasing production of large digital resources. Dictionaries, mono-lingual as well as multi-lingual, were among the first of this type. However, increasingly large corpora of works are being produced, so that it is now realistic to expect to find 'all of Shakespeare', or 'all of 19th century French fiction', or 'all of Classical Greek literature', and so on. This in turn is enabling new kinds of scholarship.
This phenomenon also makes it possible for large multi-media resources to be created, so that image and sound material of all kinds can be stored alongside texts. This is serving to break down some of the barriers between disciplines, and between types of discipline - e.g. between the literary scholar and the historian or the art historian, so that tools developed for either of the latter groups becomes grist to the mill of the former.
The proliferation of the Internet and the associated increase in bandwidth has made digital resources of all kinds, whether small or large, more and more widely available. The internet explosion seems set to continue as it becomes driven increasingly by the commercial sector, but it seems clear that XML will play a significant role in this commercial development, which should enable the text-based disciplines to take continuing advantage.
Increasingly, the essential reference works needed by researchers and students are available in electronic form (and often on-line). However, when these works are owned by commercial publishers, the electronic versions often come at a considerably higher price than printed books.
The possibilities for advanced study are enormous. Large corpora make it possible to carry out new kinds of analyses that take into account all the works of an author or all the works of a particular location, period or genre. Moreover, the integration of different types of materials - e.g. texts and images - enlarge the scope of interest of the textual scholar and encourage broader disciplinary views; this also encourages more inter-disciplinary approaches.
As an example of a project using computer methods in edition philology, we mention The Wittgenstein Archives at the University of Bergen. The aim of this project has been the transcription of Wittgenstein's complete literary estate (Nachlaß) into machine-readable form, the development of software for the presentation and analysis of the texts, the provision of access to the machine-readable transcriptions for visitors and scholars at the University of Bergen, and the publication of an electronic facsimile and machine-readable transcriptions of Wittgenstein's Nachlaß on CD-ROM. The work has been completed in 1999 and the CD-ROM volumes are being published in cooperation with Oxford University Press.
Wittgenstein's Nachlaß presented numerous problems for publication. Since Wittgenstein himself never prepared more than a negligible fraction of his writings for publication, most of his manuscripts and typescripts are full of various annotations, deletions, insertions, marginal remarks, critical instructions and cross-references, alternative formulations for particular phrases, and even writings in secret code (Geheimschrift). Neither is it always clear which of such alternative formulations he finally decided upon. In order to reproduce the texts as completely as possible, The Wittgenstein Archives have developed their own text coding system MECS (Multi-Element Code System), which provides the basis for specially designed software that offers wide flexibility in the presentation and analysis of the texts.
The increasing sophistication of the computing techniques and tools makes essential new kinds of collaboration between textual scholars and technical experts, including computer scientists and engineers, and encompassing both the theoretical and applied domains. Also, the boundaries between 'research' and 'teaching', and especially between the materials used in each, are narrowing. Increasingly it is possible for students to work with source materials in digitized forms, and to use the same reference works as researchers.
As a consequence, the role of academic support staff and the nature of their relationship with researchers, students and teachers is changing. The developments in the services offered by 'the digital library' or 'the hybrid library' are now as significant for textual scholars as for any other group. Also the relationship between scholars and publishers is changing dramatically, affecting for example the selection and preparation of teaching materials.
The relationship between teachers and students and the wider 'cultural heritage' sector is also undergoing a profound change. The availability for study of cultural objects, wherever they reside in their original form, is changing learning and teaching in universities, and in return the work of students and researchers is serving to enrich the understanding and wider access to those objects.
First, teachers need themselves to understand the technologies and how they can be applied in their disciplines. This in turn means they must receive the necessary training. If they have not themselves had the opportunity for this during their own undergraduate studies then basic training is needed, but in the context of rapid changes, which are likely to continue for the foreseeable future, opportunities are needed for regular up-date information and training. This poses the question of how this training is to be provided.
At the same time, teachers need support from technical specialists, so they can concentrate on the discipline aspects rather than on the technologies. This poses the questions of what technical depth is appropriate for the textual scholars, and what should be provided by means of institutional or other support.
There are a number of different and successful models for how textual scholars can best be provided with specialist support. The key questions here are how the institutions who already have structures in place can best develop them, and how others can best put support structures in place.
Students need to acquire basic computing proficiency. This should be gained before or outside their textual studies. This raises the question of how institutions should provide it, at least until they can expect that every student will arrive from secondary school with the necessary skills. There are a number of models for the inclusion of computing components in textual studies curricula, with two important types of model, broadly speaking. In the first of these, the interdisciplinary aspects of the appropriate formal methods and techniques form the basis of components that are taught to students in a number of disciplines; in the second, those computing techniques and tools that are held to be important for a specific discipline are included in courses offered in that discipline. Both approaches have their particular advantages, and in many cases they are related to institutional history and culture. However, in both cases some important questions arise:
A key aspect of computing in textual studies (as in other disciplines) is the need to develop students' analytical skills, which in turn make them particularly attractive in the wider labour market. This raises isses of how to ensure that prospective employers understand that the students have more to offer than merely the ability to use, for example, a database or a spreadsheet package.
The availability of user-friendly statistical packages make it possible to consider the inclusion of statistical methods in all humanities education, including the textual disciplines. One question raised by this idea is whether the general benefit to society of having truly numerate citizens is outweighed by the difficulties of applying measurements to humanities data in an appropriate way and the possible resistance to the idea on the part of humanities students.
The current opportunities for teachers interested in the subject include information, on paper and on websites, from those institutions where computing is included in some way in the textual disciplines, as well as gateway sites, such as HUMBUL, based at the University of Oxford. The two main professional associations in this area - the Association for Literary and Linguistic Computing (ALLC) and the Association for Computers and the Humanities (ACH) - also provide support. The ALLC, for example, has a programme of workshops and seminars aimed at training teachers and researchers. It also has a formal involvement in ACO*HUM.
Information exchange is facilitated by journals, including those of the two associations and a number of more specialised publications. There are also a number of conferences at which teaching and curriculum issues are covered, including the joint annual conferences of ALLC/ACH and the annual Digital Resources for the Humanities conferences. There are a number of other conferences that are relevant, including those of ELRA, the European Language Resources Association.
However, it will be clear that the overview below is limited, and that more systematic action is needed towards a comprehensive mechanism for collection and dissemination of information, if European countries are to exploit the full potential of curriculum development in these areas.
The facilities described in this section represent a small sampling of existing institution and the examples are not selected on a purely random basis for statistical use. Instead they are selected from among institutions known to the Working Group members. But as it was pointed out earlier the institutions described in this section show widely different approaches to the selection of services made available for Textual Scholarship and Humanities Computing (TS & HC).
The use of computer tools clearly requires training. There is no reason for specialists in TS & HC to train novices in basic computer literacy although this may in fact be done at certain institutions. This may reflect local practices but is not related to TS & HC. On the other hand there is a gradual scale from basic computer literacy to training in special and/or advanced software and hardware to training in computational methods in specific TS & HC related disciplines. Some of this training falls under the heading of the next section, "Computing in textual Scholarship Courses" while some of this training will take place on an ad hoc basis where it is needed and not as organised courses.
Apart from consultancy and advice on a variety of low to intermediate levels specialist support for users with a high level of competence is needed. This helps to focus and channel resources and create an awarenes of existing tools, methods and techniques. Re-inventing the wheel should be avoided, and specialst support as provided by TS & HC is indispensable in most areas of interdisciplinary research.
Another central aspect of work in TS & HC is the synergetic effect of sharing knowledge and resources. Interdisciplinary in nature, TS & HC brings together experts from different fields that share fundamental methodologies or technologies.
One of the fundamental requirements of TS & HC is the electronic text. In most cases, support facilities for TS & HC function as repositories of data; they either hold electronic (textual) data themselves, or provide access to reliable/verified resources. By storing and providing data these centres play a vital role in the dissemination process of raw data and electronic tools for processing. Repositories often function as centres for the teaching of advanced methodologies, providing cutting-edge tools, methods and techniques.
In addition, collecting, maintaining, and using textual data provides one of the areas that allow academia and industry (eg academic repositories and commercial publishers) to merge know-how and resources. This should prove mutually beneficial.
In any case (and we refer also to the general recommendations below), we suggest that institutions should provide dedicated support for TS & HC in more structured ways than is generally the case at present.
Textual scholarship in a wide meaning of the term is in one way or another involved in almost all imaginable disciplines. The following is a non-exclusive list of elements which form a part of TS & HC or which use TS & HC:
General vs. task-bound skills require different training, and this depends to large degree on the nature of TS and/or the HC course. There are general or degree courses, courses embedded into a wider framework of interdisciplinary research, or courses designed specifically to link up with existing courses in the humanities (and humanities credits systems). There is, however, a certain amount of agreement between the different institutions and countries when it comes to course content.
All of the course elements outlined here have computing at the core of the tools and methods taught in academic disciplines:
Note: For economy of expression, students who successfully complete courses in textual studies and computing of the kind proposed are referred to as TSC graduates in the remainder of this chapter.
Text capture and manipulation techniques, including mark-up, along with an understanding of metadata and version control, are important parts of document management, and are relevant in an increasing range of commercial and other spheres of activity.
Digital image capture and manipulation are likely to be part of a standard curriculum, and are increasingly important across all sectors.
Electronic publication is becoming the norm for all types of organisation. The techniques identified above are of course important for this activity, and in addition the TSC graduates would understand XML (and HTML) and would be proficient in the tools that are needed to create and maintain websites.
Multimedia digital resources of all kinds are already playing an important role in a number of areas, not least in the media and communications, entertainment, and cultural heritage sectors. TSC graduates who understand the basics of the creation, management and preservation of such resources, as well as issues of resource description and discovery, and information retrieval, will be well placed to find employment in these sectors.
However, there is a general point to be made, namely that graduates with good analytical skills, allied to the training of the imagination that comes from textual studies, have a significant contribution to make to the development of commercial products and the management of commercial operations.
Computing and information technology is having a profound effect on commercial and adminstrative relationships, e.g. between information providers (including academics) and publishers (who increasingly include broadcasters). TSC graduates would have the necessary basic understanding to play a role in the negotiation of the new relationships that are being developed.
There is strong encouragement for new partnerships to be developed between higher education and the wider world, including the commercial sector. TSC graduates would be well placed to play a role in imagining and creating these new ways of collaborating.
Those TSC graduates who develop their technical skills to a high degree may be able to make an important contribution to the development of new software tools, not only in terms of imagining and creating new types of tool, but also in ensuring that new tools take an understanding of the people who will use the tools into proper account.
Those who remain in HE will have a better understanding of the resources and responsibilities of the museums and galleries, and can help to ensure that the resources are more widely and more appropriately used in HE courses.
Those who work in the cultural heritage sector can make an important contribution to the development of high quality digital resources.
With their electronic publications skills, these graduates will also be able to assist in the dissemination of cultural heritage materials, for example within primary and secondary education, and to the wider public.
TSC graduates, particularly those with language interests or specialisations, will be familiar with multilingual corpora. TSC graduates are likely to have experience of using multilingual thesauri, particularly in relation to specialised terminology sets.
In institutions where TSC programmes and expertise are not available, collaborative arrangements could be made to take advantage of distance learning tools and frameworks, so that, for example, students could acquire at a distance the background and skills they need to pursue more advance components.
The range of analytical and practical computing skills acquired by TSC graduates would help to ensure maximum mobility in seeking employment opportunities.
The European commitment to the preservation and promotion of all its languages and cultures depends increasingly on the appropriate application of computing techniques, including digitization and multimedia electronic publication. TSC graduates will be able to make an important contribution in this area.
European commercial success in a harshly competitive world depends on a workforce that is highly literate and skillful in the management and manipulation of electronic information. TSC graduates will form an important component of this workforce.
The recommendations made in this section arise from consultation and discussion over the course of the ACO*HUM project, and are drawn from experience in institutions where computing is well established in textual studies as well as concerns expressed in institutions where this is not the case.
Some of the recommendations are general in nature, and concern infrastructural issues that are likely to be addressed by institutions as part of wider strategy covering provision for all students and/or teachers. However, it is important the text-based disciplines should be fully supported in such provision; the time has passed when these disciplines can be regarded as having lesser requirements for computing facilities and tools.
A number of the recommendations concern specific action related to establishing a framework for more systematic gathering and maintenance of information.
Note: Following the practice in the previous section, the abbreviation TS means textual studies. Hence, TS students is used to designate the generalia of students in the text-based disciplines, while TSC students designates students in these disciplines whose courses include applied computing components.
Biber, Douglas, Conrad, Susan & Reppen, Randi (1998): Corpus linguistics. Investigating Language Structure and Use. Cambridge: Cambridge University Press.
ECI/MCI (1994): European Corpus Initiative Multilingual Corpus 1. CD-ROM.
Habert, Benoît, Nazarenko, Adeline & Salem, André (1997): Les linguistiques de corpus. Paris: Arman Colin/Masson.
Kennedy, G. (1998): An introduction to corpus linguistics. Addison Wesley Longman Higher Education.
Lancashire, Ian (1991): The Humanities Computing Yearbook 1989-90. A Comprehensive Guide to Software and other Resources. Oxford: Clarendon.
Marcos-Marín & Francisco A. (1994): Informática y Humanidades. Madrid: Gredos.
McEnery, Tony & Wilson, Andrew (1996/1998): Corpus Linguistics. Edinburgh University Press.
Sinclair, John (1991): Corpus Concordance Collocation. Oxford: Oxford University Press.
Sperberg-McQueen, C. M. & Burnard, Lou (eds.) (1990): Guidelines for the encoding and interchange of machine-readable texts (TEI P1). Chicago/Oxford: ACH-ALLC-ACL Text Encoding Initiative.
Sperberg-McQueen, C. Michael & Burnard, Lou (eds.) (1994): Guidelines for Electronic Text Encoding and Interchange (Electronic Book Library Nr. 2). Providence: Electronic Book Technologies.
ICAME Journal. University of Bergen.
International journal of corpus linguistics. John Benjamins Publishing Company.
Literary and Linguistic Computing. Oxford University Press.
Research in Humanities Computing. Oxford University Press.
Association for Literary and Linguistic Computing (ALLC): http://www.allc.org/.
Computers in Teaching Initiative (CTI): http://www.cti.ac.uk.
European Language Resources Association (ELRA): http://www.icp.grenet.fr/ELRA/home.html.
Text Encoding Initiative (TEI): http://www.tei-c.org/.
Wittgenstein Archives (at Bergen): http://www.hit.uib.no/wab/.
Underlying all these components should be the basic principle that each component represents the introduction and use of formal methods, and the analytical aspects in each case are of more fundamental importance than the design and implementation skills.
It should also highly desirable that the design and implementation skills are taught in as general a way as possible, so that maximum transferability is enabled, and attachment to the specific software tools used in the course is discouraged.
Initial set of core curriculum components: