1 The LOB Corpus

The Lancaster - Oslo/Bergen (LOB) Corpus is a million-word collection of present-day British English texts, compiled under the direction of Geoffrey Leech, University of Lancaster, and Stig Johansson, University of Oslo, in collaboration with Knut Hofland, Norwegian Computing Centre for the Humanities, Bergen. Like its American counterpart, the Brown Corpus (see Francis and Kucera 1979), it contains 500 text samples of approximately 2,000 words distributed over 15 text categories:

Text
categories

Number of samples in each category

 

Brown
Corpus

LOB
Corpus

A

Press: reportage

44

44

B

Press: editorial

27

27

C

Press: reviews

17

17

D

Religion

17

17

E

Skills, trades and hobbies

36

38

F

Popular lore

48

44

G

Belles lettres, biography, essays

75

77

H

Miscellaneous (government documents, foundation reports, industry reports, college catalogue, industry house organ)

30

30

J

Learned and scientific writings

80

80

K

General fiction

29

29

L

Mystery and detective fiction

24

24

M

Science fiction Science fiction

6

6

N

Adventure and western fiction

29

29

P

Romance and love story

29

29

R

Humour

9

9

Total

 

500

500

For more details, see the LOB Corpus Manual of Information (Johansson et al 1978). The present manual deals with the tagged versions of the corpus. For more information on sampling and sources of the texts, the user must turn to the original manual.