UoB : MCTS at UoB : Aksis : BATMULT home

News archive

This page contains a record of past fellowships and events at BATMULT

Fall 2006

New fellowship:

Mojca Stritar (University of Ljubljana): KUST: A Slovene learner's corpus. (October 2006 - January 2007). REPORT

The overall aim of my project is the theoretical foundation of KUST, the Slovene Learner Corpus (SLC). During the Marie Curie fellowship at BATMULT, two major scientific challenges have been faced: the development of a reasonable set of criteria for the collection and selection of learner material to be included in KUST, and the development of an appropriate error tagging system. The main aim was to digitize and tag the material to compile a pilot learner corpus of Slovene based on texts written by learners on different levels of competence and with different first languages. The purpose of the pilot corpus was to check, and if necessary, redefine the criteria for the collection, selection and documentation of learner materials, to develop and test mark-up conventions and the error tagging principles, and finally, to show some possibilities for the use of such corpora for language description, analysis and teaching. BATMULT thus offered me a unique insight into the Norwegian learner corpus ASK and the application of some of its solutions to the Slovene language situation.


Spring 2006

New fellowship:

Pavel Vondricka (Charles University, Prague): An object-relational dictionary model and its application to Norwegian nouns. (January - October 2006). REPORT


(photograph by Koenraad de Smedt)

The research project aims on lexical description of the Norwegian nouns, both monolingually and in a contrastive analysis with other languages. The background of this research is the desirability of reusable lexical resources supporting natural language processing (NLP). While tradional printed dictionaries are large and detailed, a lot of information contained in them is implicit, cannot easily be digested by computers, and does not reflect all the contextual information of the words. In the context of the project, a system was designed and implemented which is more general than language specific dictionary editors, but less general than formal systems used in NLP. This system has been tested on the Norwegian nouns. The goal of the system is to provide a description of the morphological structure (and syntactic structure in the case of multi-word entries) of the lemmas together with information about their syntactic behavior, collocability and lexical semantic relations of their senses. In addition, the system was designed to handle wide language variability and detailed usage attributes for different variants, important factors that are often underestimated in human oriented dictionaries and almost completely ignored by most NLP projects.


Fall 2005

New fellowship:

Ron van Kesteren (University of Nijmegen): Prelexical language cues in bilingual visual word recognition. (September 2005 - January 2006). REPORT

The project aims at clarifying the role of sublexical language cues in the word recognition process of bilinguals. It investigates an issue that currently receives a lot of international attention, namely whether bilingual readers are able to modify their word identification process on the basis of their expectations or the characteristics of the words they read. Currently available studies with respect to this issue are ambiguous to whether this is the case. If the reading process of bilinguals is sensitive to language-specific markers in the input, the presence of a letter such as 'å', which only occurs in Norwegian, might help Norwegian-English bilinguals to speed up their recognition process by limiting their word search to Norwegian words. The presence of bigrams that are normal in Norwegian, but very infrequent in English, such as 'hv' in 'hvit', might have the same effect. During the stay in Bergen, two kinds of experiments are conducted: a visual language decision task and a lexical decision task with Norwegian and Englishs words, performed by Norwegian-English bilinguals. These experiments are designed to answer the question if the presence of language-specific letters and/or bigrams influences reaction times and if this effect is caused by modifications to the word recognition process.


Fall 2004

New fellowship:

Jana Zemljaric (University of Ljubljana): Building a Slovenian spoken corpus. (September - December, 2004). REPORT


Jana
(photograph by Koenraad de Smedt)

There are two main challenges in this PhD project aimed at building a Slovenian spoken corpus. The first is the development of a set of criteria for the collection and selection of spoken material to be included in a balanced non-opportunistic corpus. The second is developing guidelines for transcription of the spoken materials. These problems will be studied in a cross-linguistic comparison with materials and methods developed in Bergen and elsewhere, and accessible at BATMULT. About 100 minutes of recordings of Slovene will be used as test materials. Transcription aids such as Praat will be tested for the purpose. Finally, experiments will be done in parallelizing transcriptions with the original sound clips, using synchronization tools available at BATMULT.


Spring 2004

New countries allowed to BATMULT:

On May 1, 2004, ten new countries joined the European Union: Cyprus, the Czech Republic, Estonia, Hungary, Latvia, Lithuania, Malta, Poland, Slovakia and Slovenia. Candidates from these countries can now be selected for BATMULT fellowships!


New administrator at Batmult:

Gisle Andersen replaces Kristin Bech as Batmult's administrator and manager.

From January 1, 2004, Dr. Gisle Andersen replaces Kristin Bech as coordinator of the portfolio of language technology activities at Aksis. Andersen's responsibilities include that of managing the administration and practical running of the Bergen Advanced Training Site in Multilingual Tools (BATMULT), among several other tasks. Hence, Gisle is now the person to contact for inquires about BATMULT. Like his predecessor, Gisle Andersen has a doctoral degree in English corpus linguistics from the University of Bergen. He has also worked in the language technology industry for three years, as a product developer for Norwegian concatenative speech synthesis.


Fall 2003

New fellowship:

Céline Poudat (University of Orléans): Contrastive analysis of scientific genres: Toward a characterization of French and English linguistics research papers. (October - December, 2003). REPORT


(photograph by Céline Poudat)

This PhD project is related to the KIAP project, which is aimed at a comparative study of academic texts in different domains and written in different languages. The research stay focuses on the morphosyntactic level because of the large development of morphosyntactic annotation and the availability of many taggers for French and English. The research aims at comparing and assessing several French and English taggers in order to efficiently obtain morphosyntactic tags that will characterize the texts in a way that is relevant for the comparative study. Different parameters are taken into account in this perspective: contrastive relevance of the variables in the two languages, XML/TEI compatible encoding, and merging of the output tags. This research stay benefits from BATMULT expertise and tools in the areas of corpus tagging and text encoding.

Cristiano Furiassi (University of Torino): False anglicisms in Italian: Retrieval of examples in large corpora of written texts. (August - October, 2003). REPORT


(photograph by Koenraad de Smedt)

The average Italian speaker does not seem to be aware of the fact that many English sounding or English looking words are not at all English; instead they are autonomous coinages which are usually referred to as 'false anglicisms'. The aim of this research stay at BATMULT is to identify authentic examples of false anglicisms and subject these to a contrastive linguistic analysis. The ultimate goals of this PhD student are to arrive at a detailed typology and compile a dictionary of false anglicisms. The project will benefit from hands-on training in corpus linguistic tools such as those available in Bergen, and will also use the English language resources available in Bergen. The method will mainly consist of automated searching in monolingual Italian and English corpora and of automated string comparisons in Italian-English parallel corpora.

New administrative unit at the University of Bergen:

Until now, the aministration of BATMULT has been taken care of by the Humanities Information Technologies centre (HIT) at the University of Bergen. This centre has now become a part of the Department of Culture, Language, and Information Technology (AKSIS). The administration of BATMULT will be continued by AKSIS.


Spring 2003

Natascia Leonardi (University of Macerata): An electronic edition of John Wilkins' Conceptual and Alphabetic Dictionary. (April - July, 2003) REPORT


(photograph by Koenraad de Smedt)

This PhD research project studies John Wilkins' Essay Towards a Real Character and a Philosophical Language (London 1668). This book elaborates a universal language intended as an instrument for precise and unambiguous communication. Several aspects of Wilkins' ambitious publication are interesting from a scientific viewpoint: not only its use of linguistic and cognitive terminology, but also its textual structure, consisting of two interrelated parts: a hierarchical taxonomy is presented in the form of Tables of the Universal Philosophy, while the lexical units are listed in the Alphabetical Dictionary. The connection between the different parts of Wilkins' Essay reveals a complex network of definitions and semantic relations, unparalleled in the epoch when this work was written and exhibiting modern features well ahead of its time. Natascia Leonardi's research is complemented by the development of a digitized version of Wilkins' work that is intended as a faithful reproduction of the original defining architectures. An XML encoding is applied to the tables, while the alphabetical dictionary makes use of lexicographic tools. The digitized integration of the two defining sections of Wilkins' work intends to fully reveal the potential of its articulated defining scheme and facilitates the reader's access to the different parts of Wilkins' Essay. Natascia Leonardi benefits from BATMULT's extensive expertise in text coding of complex documents.

Continued fellowship:

Luis Serrano Fernández (see below) obtained an extension allowing him to continue his fellowship in Bergen until February 22, 2003.


December 2-3, 2002

Event:

With the financial support of the University of Bergen, BATMULT organizes an:

International seminar on corpus alignment

This seminar, with the participation of prominent researchers from Norway and Sweden and two BATMULT fellows from Spain, aims at strengthening the BATMULT site and the cooperation between research groups.


Fall 2002

Luis Serrano Fernández (University of León): Translation, Film and Censorship: The Translation of Film Texts from English into Spanish 1975-1985. (Aug. 2002 - Feb. 2003) REPORT

Ignacio Pérez Álvarez (University of León): Translation, Prose and Censorship: The Translation of Narrative Texts from English into Spanish 1936-1962. (Aug. - Dec. 2002) REPORT

From left: Luis Serrano Fernández, Ignacio Pérez Álvarez (photograph by Koenraad de Smedt)


Both fellows work as PhD students in the context of a larger research project Translation and Censorship in Spain (1939-1985). The research is aimed at a linguistic investigation of censorship in translation during and shortly after the Franco dictatorship. The current projects are focused on corpora of English film scripts and narrative texts respectively and the different versions of their translations into Spanish. The method is based on defining a database of parallel texts and their linguistic analysis. Specific research questions are focused on a characterization of the linguistic elements and levels of censorship (including self-censorship by translators). These PhD students benefit from Bergen's expertise and tools in the area of parallel corpora.





Research Areas

Facilities

How to Apply

Financing

Who Where

News archive

Background

Projects

Links

FAQ


Official page
Last updated March 22, 2007 by Koenraad de Smedt