Corpora: Sentence splitting

Mark Stevenson (
Tue, 2 Feb 1999 11:30:21 GMT

I have aquired some text which has been produced by a speech recognition
system and so is upper case throught and has no puncutation. I would like to
be able to split this into sentences for part of speech tagging and parsing.
I have looked at some of the existing literature on sentence splitting but
everything I saw seemed to assume that the text still contained punctuation
which isn't really suitable for what I need.

So I guess my question is: does anyone know of any techniques which could be
used to split a stream of words, without punctuation, into sentences?

Thanks in advance,

Mark Stevenson
Research Associate
Natural Language Processing Group
Sheffield University (0114) 222 1899