Re: [Corpora-List] Morphological segmentation & Morphosemantic parsing for French

From: Fiammetta NAMER (Fiammetta.Namer@univ-nancy2.fr)
Date: Fri Feb 04 2005 - 08:46:38 MET

  • Next message: Anssi Yli-Jyra: "[Corpora-List] CFP: FSMNLP 2005 - Finite State Methods in Natural Language Processing"

    Hi John and Edina,

    I am currently developing a morpho-semantics parser for French (DériF)
    based on linguistic constraints. Words decomposition is recursive,
    hierarchical and any complex input (be it a neologism or an attested word)
    is provided a pseudo-definition wrt to the morphological process which
    relates it to its base.

    Derif is developing one morphological process type (= module : for instance
    noun-to-adjective -ique suffixation is a module) after the other, so that
    it does not account for all morphological processes yet; so far,
    it is able to parse around 30 word formation types, including suffixation,
    prefixation rules, conversion and neoclassical compounding.

    It is a simple Perl program, that requires only to have Perl 5.8 installed.

    DeriF recent developments focus on biomedical terminology. Last DériF
    version allows neoclassical compounds to be grouped into lexical classes,
    by calculating synonymy, hyponymy and approximation relations.

    Here is an example:
    ===========================================
    gastralgie/NOM==>
             [ [ gastr N* ] [ algie N* ] NOM ]
    (gastralgie/NOM, algie/N*)
    " douleur (du -- liée au) estomac "

    Constituants = /gastr/algie/

    gastralgie/NOM: synonym of gastrodynie/NOM, stomacalgie/NOM,
    stomacodynie/NOM, stomachodynie/NOM, (gastralgique/ADJ)
    gastralgie/NOM: subtype of abdominalgie/NOM
    gastralgie/NOM: see also entéralgie/NOM,
             entérodynie/NOM, gastrite/NOM,
             hépatalgie/NOM, hépatodynie/NOM,
             pancréatalgie/NOM
    =============================================

    More details in

    http://www.univ-nancy2.fr/pers/namer/Publis/MEDINFO2004.doc

    Unfortunately, it is still too soon to deliver a version of DeriF because
    results have still to be validated.

    As soon as results for medical terminology are validated (i.e. in a few
    months, at the end of the French national UMLF project, coordinated by P.
    Zweigenbaum and supported by grants from the French Ministry of
    Education), they will be made freely available for the scientific community

    Greetings
    Fiammetta

    At 08:17 27/01/2005 -0600, John A Goldsmith a écrit:

    >In connection with the Linguistica project
    >(<http://linguistica.uchicago.edu/>http://linguistica.uchicago.edu ,
    >and
    ><http://linguistica.uchicago.edu/alchemist.html>http://linguistica.uchicago
    >.edu/alchemist.html ), we are in the process of building gold-standards
    >of morphological segmentation in a common XML format for a number of
    >languages. Our concern is more with morphological segmentation (and
    >allomorphy) and less with tagging of morphosyntactic features.
    >
    >
    >
    >I would very much appreciate pointers to any lists of words, in any
    >language, with an indication of correct morphological segmentation, or
    >pointers to software that does a good job of accomplishing this in
    >particular languages.
    >
    >
    >
    >Some morphological parsers focus on providing lemmatization or
    >morphosyntactic features, like Namer s FLEMM mentioned by Jean Véronis, as
    >far as I can tell; these do not help us with our task. In addition, since
    >our goal is to use these gold standards for testing, rather than for
    >training, accuracy is particularly important.
    >
    >
    >
    >I ll post a summary of all responses I receive. Thanks very much!
    >
    >
    >
    >John Goldsmith



    This archive was generated by hypermail 2b29 : Fri Feb 04 2005 - 09:25:34 MET