Corpora: Sentence splitting

Mark Stevenson (m.stevenson@dcs.shef.ac.uk)
Tue, 2 Feb 1999 11:30:21 GMT

I have aquired some text which has been produced by a speech recognition
system and so is upper case throught and has no puncutation. I would like to
be able to split this into sentences for part of speech tagging and parsing.
I have looked at some of the existing literature on sentence splitting but
everything I saw seemed to assume that the text still contained punctuation
which isn't really suitable for what I need.

So I guess my question is: does anyone know of any techniques which could be
used to split a stream of words, without punctuation, into sentences?

Thanks in advance,
Mark

------------------------------------------------------------------------------
Mark Stevenson
Research Associate marks@dcs.shef.ac.uk
Natural Language Processing Group http://www.dcs.shef.ac.uk/~marks
Sheffield University (0114) 222 1899
-----------------------------------------------------------------------------