Re: Corpora: Corpus Linguistics User Needs

Philip Resnik (resnik@umiacs.umd.edu)
Wed, 29 Jul 1998 13:30:51 -0400 (EDT)

I agree that some degree of programming ability should be considered a
necessary skill for linguists working on corpora. In fact, I'm
teaching a "programming for linguists" course this fall primarily for
that reason -- if anybody has good materials/exercises in LISP and/or
Perl, would you please drop me a line?

> 2. Produce libraries for Perl specific to Corpus linguistics
> - Although Perl is great for text, it would be nice to
> have libraries for reading common format and dealing
> with such data. It is also apparent that a really
> good statistics package is in order.

FYI, Dan Melamed has written one set of perl tools available under the
GNU General Public License that might be helpful -- see
http://www.cis.upenn.edu/~melamed/stats.html.

Also see CPAN (Comprehensive Perl Archive Network, http://www.perl.com/CPAN/).
There are some statistics modules available there; in particular, look
at http://www.perl.com/CPAN-local/modules/by-module/Statistics/.

Neither of these is really the comprehensive, standard library we all
seek, but they're a start.

Philip Resnik, Assistant Professor
Department of Linguistics and Institute for Advanced Computer Studies

1401 Marie Mount Hall UMIACS phone: (301) 405-6760
University of Maryland Linguistics phone: (301) 405-8903
College Park, MD 20742 USA Fax : (301) 405-7104
http://umiacs.umd.edu/~resnik E-mail: resnik@umiacs.umd.edu