Hi, Duncan,
as for OCR problems, you can probably use:
1. Christoph Ringlstetter, Klaus U. Schulz and
Stoyan Mihov: Orthographic Errors in Web Pages -
Towards Cleaner Web Corpora. Computational Linguistics 32(3): 295-340.
2. Strohmaier, Christian, Christoph Ringlstetter,
Klaus U. Schulz, and Stoyan Mihov. 2003a.
Lexical postcorrection of OCR-results: The
web as a dynamic secondary dictionary?
In Proceedings of the Seventh International
Conference on Document Analysis and
Recognition (ICDAR 03), pages 1133–1137,
Edinburgh.
3. Strohmaier, Christian, Christoph Ringlstetter,
Klaus U. Schulz, and Stoyan Mihov.
A visual and interactive tool for
optimizing lexical postcorrection of
OCR results. In Proceedings of the IEEE
Workshop on Document Image Analysis
and Recognition, DIAR’03, Madison, WI.
4. Ringlstetter, Christoph. 2003. OCRKorrektur
und Bestimmung von
Levenshtein-Gewichten. Master’s
thesis, LMU, University of Munich.
Mirko Tavosanis
Dipartimento di Studi italianistici
Universita' di Pisa
http://www.humnet.unipi.it/ital/
This archive was generated by hypermail 2b29 : Thu Nov 16 2006 - 19:31:02 MET