Armin,
> I was wondering if you could point me to good sentence
> splitters for the
> following languages: German, Russian
For Russian:
http://aot.ru/download/graphan.tar.gz (source in C++, dll is included in
http://aot.ru/download/shortrml.zip).
For most purposes I use a regexp (in javascript, conversion to Perl/Python is
straightforward):
var _DELIMS_OPEN_RAW_ = '(["</' ;
var _DELIMS_OPEN_ = '\\'+_DELIMS_OPEN_RAW_.split('').join('\\') ;
var sentenceSplitter = new RegExp(
'(?:\\.|\\!|\\?)+\\s+(?=['+_DELIMS_OPEN_+']?[А-ЯЁA-Z])' ) ;
-- Victor Kapustin
This archive was generated by hypermail 2b29 : Sun Feb 18 2007 - 14:57:54 MET