Re: Corpora: Sentence splitting

Ted E. Dunning (ted@aptex.com)
Fri, 16 Oct 1998 18:40:49 -0700

>>>>> "hl" == Heui Seok Lim <limhs@nlp.korea.ac.kr> writes:

hl> ex. He said "where are you headed?"

This message is really just a silly nit, not a real contribution. The
real contribution, repeated by others was to use real corpora to test
your sentence segmenter.

You can consider the sentence boundary to be signalled by the ? in
this example. It is good to keep the " as part of the sentence, but
the strongest signal is the question mark. The special handling of
the " is easiest where you are considering which punctuation to treat
as transparent.

The moral is that even the most trivial language processing is subject
to error and complexity. What *were* they thinking of when they
designed language!?