Corpora: New Norwegian Corpus

Janne Bondi Johannessen (
Mon, 8 Nov 1999 16:34:15 +0100

** our apologies to those who receive multiple copies of this message


We are pleased to announce that the Oslo Corpus of Tagged Norwegian
Texts is now available for the public.=20


The Oslo Corpus consists of a bokmaal part (18.5 million words) and a
nynorsk part (3.8 million words). Both parts are tagged with the
disambiguating Constraint Grammar tagger developed here at the
University of Oslo. It is possible to search for words or wordstrings
and combine that search with a grammatical requirement, or to search
for a grammatical category without any words string specification.=20

The texts contained in the Oslo Corpus are not meant to be
representative in any way; they are simply texts collected over the
years by the universities in Oslo and Bergen. Even so, there is a good
mixture of texts from newspapers and magazines, laws and public
reports, and novels. =20

One of the best features of the Oslo Corpus, we think, is the very
simple user interface. The Oslo Corpus is web-based, and requires no
background knowledge about tags or texts. The user only has to fill in
a box for a word (or wordstring), or click in a box for a grammatical
category (or make a combined search).=20

In order to use the corpus, you need a user account and a pass word,
which you get by following the instructions on the corpus web site:


The Corpus will be presented at the KORFU-conference in V=E4xj=F6 on 11.
November, and at the MONS 8-conference in Troms=F8 on 20. November.=20

Janne Bondi Johannessen.


Professor Janne Bondi Johannessen Tlf: + 47-22 85 68 14

Tekstlaboratoriet E-post:

Institutt for lingvistiske fag Faks: +47-22 85 69 19

Universitetet i Oslo Internett:

P.b 1102 Blindern

0317 Oslo, Norway