UoB : MCTS at UoBThe HIT Centre : BATMULT home

 Facilities and support

The coordinator of the training site, Professor Koenraad de Smedt, will have the overall responsibility for the training of the students. The applicants will be asked to describe their interests and propose a research topic, and they will be assigned a scientific advisor accordingly. 

The students will be offered full access to all necessary equipment, including personal computers, office consumables, office space, software, libraries, and any technical assistance needed for the research projects. A local administrator designated to the MCTS will give the students the necessary logistic and administrative support related to travel arrangements, grant payments, office space, and other practical matters. The UoB also has an Office for Foreign Students, which takes care of such practical and administrative aspects related to the stay as registration, paper exchange with immigration authorities, accommodation at University boarding houses and desired access to language courses.

Among Bergen's own specific research resources, tools and training facilities that are offered to students, we mention the following (in addition, we also have access to many resources owned by other institutions): 

* ICAME (International Computer Archive of Modern and Medieval English). The HIT Centre hosts this archive, which contains 20 corpora with a total of 17 million words. The archive is commercially available on CD-ROM. In addition, HIT publishes the electronic version of ICAME Journal.

* COLT (Corpus of London Teenage Language). The corpus is a part of the British National Corpus, and consists of 472,000 words of transcribed text. 

* ENPC (English-Norwegian Parallel Corpus). This corpus comprises 100 pairs of texts containing 2.6 million words. 

* UNO (Ungdomsspråk i Norden - 'Youth language in the Nordic countries'). 

* Norsk Talemålskorpus ('The Norwegian Speech Corpus'). In this project, 18 hours of speech from Bergen, Voss and Tromøya have been collected and transcribed, and the corpus is searchable on the web. 

* NOT (Norwegian Terminological Database). The database is structured according to strict terminological principles, and contains material from 38 different fields. The database contains close to 30,000 term records with a total of approx. 90,000 terms, mainly in English and Norwegian.

* SCARRIE has, among other things, developed computational word lists of Norwegian, with grammar and style information: 360,933 word forms in 72,626 lemmas, accessible on the web.

* The NorGram project is currently developing an LFG grammar for Norwegian 'bokmål'.

* Norwegian Newspaper Corpus (131 million words on WWW).

* The HIT Centre distributes the WordSmith program, which is used to make concordances and words lists to one or more texts. 

* The Translation Corpus Aligner, developed by the HIT Centre, is a program for automatic alignment of sentences in parallel texts, and it is used in combination with the Translation Corpus Explorer, a search program for parallel texts developed by the University of Oslo.

* A tagger for Norwegian developed by the HIT Centre in collaboration with the University of Oslo and the University of Stuttgart, Germany. 

* The Name Recognizer, an automatic name recognizer for Norwegian, Swedish and Danish. 

* French-Norwegian parallel corpus. It consists of 30 text pairs, which each contains approximately 12,000 words.


Research Areas


How to Apply


Who Where






Official page
Last updated Sept. 25, 2001 by Kristin Bech