Re: [Corpora-List] Semantic Distances Revisited

From: Daniel Midgley (dmidgley@arts.uwa.edu.au)
Date: Mon Dec 02 2002 - 05:09:18 MET

  • Next message: Scott Sadowsky: "Re: [Corpora-List] Corpus Sanitation - no"

    >>It's great stuff, although it's taxonomy-based.
    >>I was specifically interested in distributional methods.

    > And what is the difference - if it is possible to answer?

    I'll give it a try -- apologies in advance to more-experienced list members
    pained by my explanation.

    In a taxonomy, items are typically represented as nodes of a tree. So when
    you're measuring how similar two items are, you find them both on the tree,
    and then calculate how close they are to each other. (There are different
    ways to do this, and that's where the Hirst and Budanitsky article comes
    in.)
    It's a great approach, if you have the taxonomy already built for you.
    The pitfalls of making a taxonomy are well-known: it's a lot of work, your
    taxonomy may not hold across languages, and it's hard not to let your
    taxonomy reflect your biases.

    Distribution-based methods don't use a taxonomy; they attempt to find
    similarity based on the surrounding words. Again, there are many ways to do
    this, but the underlying assumption is that words that appear in similar
    contexts are similar to each other. E.g. Beth Levin in her work with
    English verb classes, makes the striking assertion that verbs that exhibit
    similar syntactic behaviour are semantically related. Quite a revelation
    for a linguist such as myself -- linguists have traditionally studied
    syntax, while putting semantics in the "too-hard" basket. This work showed
    that syntax can be a key to semantics.

    That's a really basic overview.
    Phil Resnik gives a thorough review of both kinds of methods in his
    dissertation. You can find it at:
    http://citeseer.nj.nec.com/resnik93selection.html
    His Lexical Acquisition talk at ACL 2002 changed my life. And may I add,
    he's one heck of a dancer.

    Feedback welcome.
    Daniel

    =-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    Daniel Midgley
    dmidgley@arts.uwa.edu.au
    + (61 8) 9371 3730
    http://www.cs.uwa.edu.au/~fontor



    This archive was generated by hypermail 2b29 : Mon Dec 02 2002 - 05:11:14 MET