Re: [Corpora-List] Automatic categorization of words.

From: Dominic Widdows (widdows@maya.com)
Date: Wed Mar 09 2005 - 15:28:09 MET

  • Next message: Damon Allen Davison: "Re: [Corpora-List] Query about nomenclature"

    Dear Cyrus,

    Be very careful about trying to find any such fixed categories by
    analyzing corpora, because you will need to do some pretty powerful
    disambiguation. The examples you cite are typical - "rock" is often
    used figuratively and as an abstract description for a kind of music,
    "justice" is used as a title to describe an actual person. (I haven't
    dug out corpus examples but this would be easy to do if you're
    interested.) There is every reason to believe that this kind of
    ambiguity is the rule rather than the exception, at least for
    relatively common vernacular words.

    Some of the examples above can be dealt with using syntactic tagging,
    chunking, etc., all of which are possible using relatively standard
    tools nowadays, at least for English. But it might be a lot more work
    than you had in mind.

    You may have considered this already, but in case you hadn't I just
    wanted to raise the possibility to your attention, because just finding
    a list of words that are categorized as concrete or abstract and
    tagging them as such when they occur in corpora will almost certainly
    give disappointing results.

    Best wishes,
    Dominic

    On Mar 9, 2005, at 12:03 AM, Cyrus Shaoul wrote:

    >
    > Dear List,
    >
    > I have been lurking for a while, but decided to post my first question
    > to the list today. I am trying to do research on the differences
    > between concrete and abstract words (ie: "rock" and "justice").
    >
    > Does anyone know of any research or tools related to automatically
    > categorizing words into these types of categories (also called
    > imageability levels) based on corpus analysis?
    >
    > Thanks in advance,
    >
    > Cyrus Shaoul
    > University of Alberta
    >
    >



    This archive was generated by hypermail 2b29 : Wed Mar 09 2005 - 15:58:20 MET