However, some ideas are much more difficult to test , and require substantial
effort to code.
I'm currently designing what is called an application programming interface (API)
for statistical models which will make it much easier to write your own code.
For example, you will be able to write a program to identify the language of the
text
(e.g. whether it is French, English etc.) or even compress it with only a few
lines of code.
The API will be based on start-of-the-art compression modelling techniques, but
it could
be based on any statistical modelling methods. I'll also be extending it to
include
Viterbi-based algorithms, so that it will be fairly simple to write programs that
do
spelling-correction, OCR text correction, part-of-speech tagging etc.
(Let me know if anyone is interested in this API, and I can post it to the list
for
discussion).
The same approach could be used to make it easier for linguists to write their
own
software. i.e. design an API specifically tailored for corpus-based research.
How much interest would there be out there in this? And what functions would
people find useful to put in this API?
Bill Teahan
Department of Computer Science
University of Waikato
Hamilton, New Zealand