Re: [Corpora-List] Finding representative terms

From: radev@umich.edu
Date: Mon Dec 26 2005 - 17:59:14 MET

  • Next message: Nakagawa: "RE: [Corpora-List] Finding representative terms"

    You should consider using TF*IDF instead of IDF. First, compute IDF
    from a large external corpus. Then, compute TF for each of the words
    in each of your input documents. A typical outcome would be:

           IDF TF TF*IDF
    the 0.01 20 0.20
    today 1.00 2 2.00
    Paris 5.00 2 10.00

    Drago

    Delip Rao wrote:
    >
    > Hi,
    >
    > Is there any work that tries to find the most
    > important/representative words from a document? I have
    > tried using IDF but results were very poor. Also IDF
    > does not make sense if we have a single document and
    > want to get the most important term(s) out of it.
    >
    > Thanks!
    > Delip
    >
    >
    >
    > __________________________________
    > Meet your soulmate!
    > Yahoo! Asia presents Meetic - where millions of singles gather
    > http://asia.yahoo.com/meetic
    >
    >
    >
    >

    -- 
    Dragomir R. Radev                                         radev@umich.edu
    Associate Professor of Information, Electrical Engineering and
    Computer Science, and Linguistics, the University of Michigan, Ann Arbor
    Phone: 734-615-5225   Fax: 734-764-2475    http://www.si.umich.edu/~radev
    



    This archive was generated by hypermail 2b29 : Tue Dec 27 2005 - 03:33:15 MET