Summer job available in Waltham, MA, USA

Jeff Adams (jeffa@kurz-ai.com)
Wed, 15 May 1996 12:09:45 -0400

SUMMER JOB AVAILABLE
_____________________________________________________________

Kurzweil AI is looking for a temporary/summer employee to
help us with our text corpora. This would be ideal for a
student in NLP &/or computing.

The job involves working with existing Perl scripts & C programs,
and writing some original scripts & programs, to collect written
corpora & prepare them for constructing language models.

The company is located in Waltham, MA, near Boston.

_____________________________________________________________

Specifically, the job will involve the following sorts of tasks:

COLLECT & GENERAL & SPECIALIZED CORPORA

* from CD-ROMS
* from web sites, email collections, & Usenet news
* from the LDC & other collaborative groups
* specialized medical & legal corpora
* foreign language corpora

FORMAT THE CORPORA

* organize text into appropriate files & directories
* delete headers, comments, HTML/SGML tags, & similar text
* identify & delete quoted material in message text
* remove or mask proper names & confidential information
in medical & legal report text

NORMALIZE THE CORPORA ACCORDING TO A GIVEN LEXICON

* mark out-of-vocabulary words
* normalize punctuation, phrases, hyphenations, & so on:
e.g.
He won't take Mr. Hill's high-stress job in New York.
|
he won't take Mr. hill 's high - stress job in New_York .

COMPUTE PRELIMINARY STATISTICS FROM THE CORPORA

* count n-grams for small values of n
* identify top-n lists of words for constructing lexicons
for speech recognition
* compute perplexities, given various training & testing
corpora & language models.

_____________________________________________________________
Interested applicants should contact Jeff Adams
jeffa@kurz-ai.com
508-893-5151 x339

-- 
Jeff Adams
Language Modeling Scientist
Kurzweil Applied Intelligence
http://www.kurz-ai.com/people/jeffa