[Corpora-List] Roget's Thesaurus as an Electronic Lexical Knowledge Base

From: Stan Szpakowicz (szpak@site.uottawa.ca)
Date: Mon Jun 12 2006 - 22:53:11 MET DST

  • Next message: Gaël Dias: "[Corpora-List] [Tight Deadline] 5 year Job Position at the Centre of Human Language Technology and Bioinformatics of the Univeristy of Beira Interior (Portugal)"

         Roget's Thesaurus as an Electronic Lexical Knowledge Base

                          http://www.nzdl.org/ELKB/

    Roget's Thesaurus in Java, designed for Natural Language Processing, is
    now available for downloading. We distribute it under the GNU General
    Public License. The system is the graduate work of Mario Jarmasz
    <http://www.site.uottawa.ca/~mjarmasz/thesis/>, who implemented it with
    the proprietary lexical data in the 1987 Penguin Roget's. Olena Medelyan
    <http://www.cs.waikato.ac.nz/~olena/> has wonderfully reengineered
    Mario's system with the public-domain 1911 Roget's.

    The Roget's ELKB package includes four examples of NLP applications:
    detecting lexical chains in text, determining semantic distance between
    words and phrases, clustering words based on their meaning and solving a
    word quiz.

    If you decide to use the ELKB, please put on your Web page a link to the
    download site. (See my page home for a nifty logo.)

    [The system is perfectly functional, but the 1911 data are antiquated.
    We are in discussion with Pearson Education, the owner of the 1987
    Penguin Roget's, about the fee structure and distribution mode that
    would enable the NLP community to acquire the much more attractive
    data.]

    --
    Stan Szpakowicz, PhD, Professor  613-562-5800/6687 /~\ The ASCII Ribbon
    SITE, Computer Science       szpak@site.uottawa.ca \ / Campaign Against
    University of Ottawa    www.site.uottawa.ca/~szpak  X     HTML Email
    



    This archive was generated by hypermail 2b29 : Mon Jun 12 2006 - 23:20:49 MET DST