Re: Corpora: Corpus Linguistics User Needs

Maria Wolters (wolters@ikp.uni-bonn.de)
Wed, 29 Jul 1998 14:21:32 +0200

I think Henning Reetz is quite right in pointing out that there is no need
to reinvent the wheel, or, for that matter, rewrite Word Cruncher
every time we need its functionality. Defining and
coding efficient data structures and algorithms fro handling
huge batches of text is quite demanding.

But there are a lot of low-level
tasks that can be performed pretty well by combining standard UNIX tools
such as sed and grep, a little perl, and a basic understanding of pattern matching.
Especially if the annotation of the corpus is easy to parse. Writing such scripts
should not be too difficult, even for people with very limited programming skills.
I think that's an important part of the point Geoffrey Sampson wanted to make.

Returning to the question which triggered the discussion: maybe we should
distinguish between comparatively easy tasks, which can be handled by a small
perl script (or a combination of lex and yacc, for that matter), such
as frequency counts, and more
complicated ones, which really require a skilfull programmer.

Maria Wolters
wolters@ikp.uni-bonn.de