RE: [Corpora-List] mutual similarity

From: Adam Kilgarriff (adam@lexmasterclass.com)
Date: Wed Jun 28 2006 - 07:33:26 MET DST

  • Next message: Martin Wynne: "Re: [Corpora-List] offer of research resource"

    Stefano,

     

    This area has blossomed in recent years and there is ample work on the
    question.

     

    Greg Grefenstette explored it in detail in his thesis and associated book
    (Explorations
    <http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=527911> in
    Automatic Thesaurus Discovery, Kluwer, 1994). Dekang Lin introduced a new
    measure which has been adopted by quite a few people (including myself) in
    his COLING 1998 paper. Lillian Lee compared various measures in her thesis,
    see her papers in Proc ACL 1999. Since 2003, two excellent theses on the
    question are by Julie Weeds (Sussex Univ) and James Curran (Edinburgh Univ).
    Both of them are authors and co-authors on various papers further exploring
    the topic - see e.g., Weeds and Weir in the latest CL, 31 (4) 2005. Geffet
    and Dagan (COLING 2004) is another thought-provoking paper. In ACL-COLING
    2006, Gorman and Curran move on to the next question: what are the
    computational issues about producing thesauruses from very large (billion+
    word) corpora.

     

    Regards,

     

    Adam

     

     

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Stefano Vegnaduzzo
    Sent: 28 June 2006 05:28
    To: CORPORA@UIB.NO
    Subject: [Corpora-List] mutual similarity

     

    Dear all,

    I would like to ask for pointers/literature/references/etc on the topic of
    mutual (or reciprocal) similarity. Here is what I mean by this:

    Given a term t0 and a set of terms t1 ... tn, a similarity measure M
    typically allows you to rank the terms t1 & tn according to their similarity
    to t0.

    My question: Given a term t0 and a set of terms t1 ... tn, and a similarity
    measure M, and assuming a non-symmetric similarity relation (i.e., M(t1,t2)
    is different from M(t2,t1), how do you compute the mutual similarity MS of
    t0 with respect to each term t1 ... tn, where M(t0,ti) is different from
    M(ti,t0). In other words, I am interested in computing and ranking the
    mutual similarity of all pairs MS(t0,ti), where MS(t0,ti) is some function
    of M(t0,ti) and M(ti,t0).

    Cases of interest are for example those where M(t0,tX) is a bit higher than
    M(t0,tY) but M(tY,t0) is much higher than M(tX,t0), so I would like a mutual
    similarity measure to capture this by assigning MS(t0,ty) a higher score
    than MS(t0,tx)

    I found very limited references in the literature. For example D. Hindle.
    Noun classification from predicate-argument structures (1990) defines
    reciprocal similarity as the case where two terms are each other's most
    similar term, but this is way too restrictive for what I am interested in.

    Any help will be appreciated,
    thanks,

    Stefano Vegnaduzzo
     

      



    This archive was generated by hypermail 2b29 : Wed Jun 28 2006 - 07:34:40 MET DST