[Corpora-List] Many thanks and a summary of all the responses.

From: Yuanyong Wang (wyy@cse.unsw.EDU.AU)
Date: Tue May 10 2005 - 02:54:19 MET DST

  • Next message: G.J. Steen: "[Corpora-List] two PhD positions in Dutch discourse analysis"

          Many thanks, the responses definitely cleared my confusion and
    provide me with more valuable information regarding how to utilize the
    senseval test data and the issue of word sense disambiguation itself.

          All the responses are summarized below:

                                   The question:
    ...I am planning to conduct experiment
    on the Senseval-3 data. But after reading the answer key file, one fact
    appears a bit confusing, sometimes for one test case, multiple sense
    tags are given, and one of the multiple sense tags could be simply a
    letter
    "U". ... how to make sense of those multiple sense tag cases?

                               Response from Jordi:

       As far as I am concerned there are two special tags in SENSEVAL-II ( and
    proably it also applies for SENSEVAL-III)

    P: meaning PROPER-NAME
    U: meaning UNASSIGNABLE

    Note that, as there were multiple annotator, you can find a disjuntion of
    the tag "U" with a sense tag.

    You can find a description of the tags and how they were develop in
    English
    Lexical Sample Task Description Adam Kilgarriff
    <http://www.itri.bton.ac.uk/events/senseval/englexsamp.ps>

    available at http://www.itri.bton.ac.uk/events/senseval/englexsamp.ps

                               Response from Adam:

    The trouble with word sense disambiguation is word senses. They just won't
    behave.

    Sometimes, the best that a human can do is to say that a corpus instance
    is
    related to more than one word sense (so it is tagged with multiple sense
    tags) or that it is unassignable (U) or that it is like one of the senses
    in
    one way but not in others (combination of U and one or more regular sense
    tag.) This is the scheme we have used for English for all three Sensevals,
    you can find descriptions in the SENSEVAL 1 Special Issue of Computers and
    the Humanities 34 (1-2) amongst other places, here are links to papers
    that
    discuss it

             Best
                     Adam

    2000 (with Joseph Rosenzweig) "English Framework and Results
    <http://www.lexmasterclass.com/people/Publications/2000-KilgRosenzweig-Sense
    val1frame.pdf> ." Computers and the Humanities 34 (1-2), Special Issue on
    SENSEVAL.
    2000 (with Martha Palmer) Introduction to the Special Issue on SENSEVAL
    <http://www.lexmasterclass.com/people/Publications/2000-KilgPalmer-Senseval1
    Intro.pdf> . Computers and the Humanities 34 (1-2). (Also guest editors
    for
    the Special Issue)

                                 Response from Diana:

    You will probably hear from the task organisers directly. U is
    normally given for Unassigned tags where annotators are not sure what
    the appropriate tag is, and multiple tags are given where several senses
    are appropriate for the same instance. These details should be in the
    task descriptions in the proceedings (which you can obtain from the web
    page).

                                Response from Ted:

        This is an interesting question, and there are really two ways to look
    at
    it. First, you may have cases where the use of the word is truly
    ambiguous.

    I wish I were a star.

    This most likely means a movie star, but if the surrounding context is
    vague or relating to science fiction or something, it might mean a star
    like our sun. In a case like this, where there is true ambiguity, the
    a tagger might give multiple senses.

    There are also cases where the sense distinctions are vague or very finely
    grained. This is actually the main problem for taggers with respect to the
    Senseval data, and much of this revolves around the sense inventory. For
    example, the word "art" has several very very closely related senses in
    WordNet, and it's very hard to pick them apart. So in a case like this, a
    tagger may have no choice but to pick multiple senses.

    So, my advice is not to just look at the answer key, but rather look at
    the senses that those key results are pointing to. I am fairly confident
    that in many cases you'll see they are very finely grained distinctions
    that are hard to pull apart.

                Thanks again for all the reply.

         Regards
         Robin



    This archive was generated by hypermail 2b29 : Tue May 10 2005 - 03:09:34 MET DST