Re: Corpora: Corpus Linguistics User Needs

Ted E. Dunning (ted@aptex.com)
Fri, 31 Jul 1998 14:52:04 -0700

here are some specific answers to phillip's questions from my own
perspective. you mileage may vary.

>>>>> "pr" == Philip Resnik <resnik@umiacs.umd.edu> writes:

pr> - Is programming experience in a classroom/class-project
pr> setting enough to be competitive, or do students need a few
pr> successful non-classroom programming projects on their c.v. to
pr> get serious consideration (if so, how many)?

classroom work is barely sufficient to get hired here as somebody with
no more than an undergraduate degree. somebody with a higher degree
will definitely need better experience. i would count experience in a
very strong research lab with a demonstrated ability to produce
software or an industrial position.

some of the strongest candidates i have had had worked for a time (not
long) as software engineers and then decided they wanted to get down
to the meat of the matter rather than just implement other peoples
ideas.

pr> - Does programming expertise in any reasonably useful
pr> language (e.g. C, C++, Java, Perl, LISP, Prolog) generally
pr> suffice, with the assumption that students can learn new
pr> languages on the job, or do students absolutely need
pr> experience in the language they'll be using?

yes. this suffices.

my order of interest in terms of language experience would be
(roughly)

highest> TCL or Perl
Java
C
Lisp
Prolog
lowest> C++

some may find this list a bit idiosyncratic. the reason i put the
scripting languages so high is that I generally want somebody who
understands how to glue software together rather than somebody who
thinks they need to make the stuff being glued together. i put C++ at
the bottom of the list because (in this category of applicant) having
only C++ experience can mean only a few things, none of them very
helpful for a computational linguist. i put Java quite high because
it tends to indicate an interest in change and new things rather than
because of the merit of the language itself for these tasks. I put
lisp rather low because almost all of the people who claim Lisp
experience have actually only programmed a solution to the cannibals
and missionaries problem. if somebody interviewed who actually *knew*
lisp or prolog, i would consider them much more strongly. for a
programmer position, this list would be considerably different, of
course.

pr> - How much of an understanding of fundamental computer
pr> science concepts (e.g. basic computer architecture,
pr> fundamental algorithms, computational complexity), if any, do
pr> you consider a minimum for a student to be competitive in a
pr> computational linguistics job?

an introductory level of understanding is absolutely critical. my
list of what is critical to understand is, however, quite different
from what most curricula provide. what i look for includes the
following:

* understanding of what hashing and sorting really means
* decent understanding and intuition for statistics
* basic understanding of what RAM and disk and CPU are
* basic understanding of software engineering goals and tools

I care not at all (in the context of such a position) for the content
of most compiler classes, computer language theory of implementation,
parsing theory, detailed complexity theory, and formal proofs of
correctness. for a programming position, i would rate formal proofs
and detailed knowledge of algorithms and language design much more
highly.

hope this helps somebody get a job!