INTRODUCTION

In September 1984, a joint research project into the automatic assignment of intonation was undertaken at the University of Lancaster in collaboration with the Speech Research Group at IBM UK Scientific Centre. The first aim of the project was to collect samples of natural spoken British English which could be used as a database for analysis and for testing the intonation assignment programs. The result is the Spoken English Corpus (SEC), a machine-readable corpus of approximately 52,000 words of contemporary spoken British English.

Unlike most other corpora currently being used in the computational linguistics field, the SEC exists in various forms. Research into speech synthesis requires some study of the relationship between the orthographic and prosodic representations of speech. The SEC material, therefore, has been transcribed orthographically and prosodically, both these versions being generated independently from an unpunctuated version. A grammatically annotated version has been produced using the CLAWS word-tagging system to allow an analysis of the influence of syntax on prosody. Recordings of speech samples were produced mainly by IBM UK Scientific Centre using high-quality equipment. The tapes are of a standard suitable for instrumental analysis (for example, the extraction of FO).

It is impossible in a corpus of this size to include samples of every style of spoken English; instead, emphasis has been placed on collecting a sizeable sample of the type of spoken English which is suitable as a model for speech synthesis. Small samples of highly-stylized speech (for example, that used in poetry reading or a sermon) have been included, but will not be used in the initial testing of the intonation assignment programs.

The SEC in its various versions shouid prove most useful to those researching in the speech synthesis or speech recognifion fields. It has already proved to be a valuable tool for teaching purposes at the University of Lancaster, providing students with the opportunity for close study of the phonetics of natural spoken English.

The SEC project was supported in 1984-5 by the University of Lancaster Humanities Research Fund and by IBM UK Ltd., and subsequently by IBM UK Ltd. IBM have not only given financial support, but have actively participated in the project.

A large number of people have contributed to the project:

The project team comprised Dr G Knowles (University of Lancaster), Dr P Alderson (IBM), Dr B Williams (IBM) and L Taylor (University of Lancaster). Prof G Leech (University of Lancaster) and Prof G Kaye (IBM) initiated the project and maintained an acttive collaborative role in it. Additional help was provided by A Seil and N Campbell (IBM), and S Elliot, C Grover, and Dr E Briscoe (University of Lancaster).

The majority of texts in the corpus were obtained from the BBC, and thanks must go to Norma Jones in the BBC Sound Archives for her help in organising contracts, contacting speakers, and providing information for the three years of the project.

Thanks are also due to all those who gave free permission for their work and/or speech to be included in the corpus:

Elizabeth Seil (Story Time); Louise Botting (Money 8ox); James Cox (Néws); John Carlin (From our own Correspondeno; Alina Dadlez and Decca International (extracts from Betjeman Reads Betjeman); Isabelle Dean (Time for Vérse); Paddy Feeny (Review of the Year); Dr Robert Fox (Science and Bellef in 18th C France); Susan Hampshire (Week's Good Cause); David Henderson (The Reith Lectures III); John Hollis (Listening & Readlng); Martin Jarvis (Moming Story and Time for Vérse); Jullet Johnson (Money Box); Catherine Kneafsey and Oxford University Press (extracts from Streamline English Sedes); Doris Lessing (author of Through the Tunnell); Colin Lyas (Nelson Mandela and Tom Stephenson speeches); Christopher Poole (Review of the Year); Brian Redhead (Week's Good Cause); Diana Ruault and Open University Educational Enterprises Ltd. (OU programmes - Modem Art, Science and Bellef in 18th C France, Development of Fracdons); Graharn Seal (author of What shall we do if it rain?); Simon Taylor (Review of the Year); Janet Trewin (News); The Met Office (Weather Forecasts).

Special thanks to Molly Price-Owen of BBC World Service Sport for her help with the Review of the Year extracts.