Corpora: Software release: Timbl 2.0

Jakub.Zavrel@kub.nl
Wed, 6 Jan 1999 15:31:44 +0100 (MET)

----------------------------------------------------------------------
Software release: TiMBL 2.0
Tilburg Memory Based Learner
ILK Research Group, http://ilk.kub.nl/
----------------------------------------------------------------------

(sorry if you get this more than once)

The ILK (Induction of Linguistic Knowledge) Research Group at Tilburg
University, The Netherlands, announces the release of a new version of
TiMBL, Tilburg Memory Based Learner (version 2.0).

TiMBL is a machine learning program implementing a family of
Memory-Based Learning techniques. TiMBL stores a representation of the
training set explicitly in memory (hence `Memory Based'), and
classifies new cases by extrapolating from the most similar stored
cases.

TiMBL features the following (optional) metrics and speed-up
optimalizations that enhance the underlying k-nearest neighbor
classifier engine:

- Information Gain weighting for dealing with features of differing
importance (the IB1-IG learning algorithm).
- Stanfill & Waltz's / Cost & Salzberg's (Modified) Value Difference
metric for making graded guesses of the match between two
different symbolic values.
- Conversion of the flat instance memory into a decision tree,
and inverted indexing of the instance memory, both yielding
faster classification.
- Further compression and pruning of the decision tree, guided
by feature information gain differences, for an even larger
speed-up (the IGTREE learning algorithm).

The current version is a complete rewrite of the software, and
offers a number of new features:

- Support for numeric features.
- The TRIBL algorithm, a hybrid between decision tree and nearest
neighbor search.
- An API to access the functionality of TiMBL from your own C++
programs.
- Increased ability to monitor the process of extrapolation from
nearest neighbors.
- Many bug-fixes and small improvements.

TiMBL accepts commandline arguments by which these metrics and
optimalizations can be selected and combined. TiMBL can read the C4.5
and WEKA's ARFF data file formats as well as column files and compact
(fixed-width delimiter-less) data.

-[download]-----------------------------------------------------------

You are invited to download the TiMBL package for educational or
non-commercial research purposes. When downloading the package you
are asked to register, and express your agreement with the license
terms. TiMBL is *not* shareware or public domain software. If you have
registered for version 1.0, please be so kind to re-register for the
current version.

The TiMBL software package can be downloaded from

http://ilk.kub.nl/software.html

or by following the `Software' link under the ILK home page at
http://ilk.kub.nl/ .

The TiMBL package contains the following:

- Source code (C++) with a Makefile.
- A reference guide containing descriptions of the incorporated
algorithms, detailed descriptions of the commandline options,
and a brief hands-on tuturial.
- Some example datasets.
- The text of the licence agreement.
- A postscript version of the paper that describes IGTREE.

The package should be easy to install on most UNIX systems.

-[background]---------------------------------------------------------

Memory-based learning (MBL) has proven to be quite successful in a
large number of tasks in Natural Language Processing (NLP) -- MBL of
NLP tasks (text-to-speech, part-of-speech tagging, chunking, light
parsing) is the main theme of research of the ILK group. At one point
it was decided to build a well-coded and generic tool that would
combine the group's algorithms, favorite optimization tricks, and
interface desiderata. The current incarnation of this is now version
2.0 of TiMBL.

We think TiMBL can make a useful tool for NLP research, and, for that
matter, for any other domain in machine learning.

For information on the ILK Research Group, visit our site at

http://ilk.kub.nl/

On this site you can find links to (postscript versions of)
publications relating to the algorithms incorporated in TiMBL and on
their application to NLP tasks.

The reference guide ("TiMBL: Tilburg Memory-Based Learner, version
2.0, Reference Guide.", Walter Daelemans, Jakub Zavrel, Ko van der
Sloot, and Antal van den Bosch. ILK Technical Report 99-01) can be
downloaded separately and directly from

http://ilk.kub.nl/~ilk/papers/ilk9901.ps.gz

For comments and bugreports relating to TiMBL, please send mail to

Timbl@kub.nl

Please also send a mail to this address if you do not wish to receive
further mail about Timbl.

----------------------------------------------------------------------