[Corpora-List] Request for help concerning a LSA problem

From: Cecilie Desiree Widsteen (cecilidw@student.iln.uio.no)
Date: Thu May 04 2006 - 10:29:06 MET DST

  • Next message: vdipede\@libero\.it: "[Corpora-List] corpora of spoken English"

    Hello all,

    I´m currently trying to implement Latent Semantic Analysis, as part of
    an automatic classification system. I´m programming in Java, and using
    the Jama Matrix package for the matrix stuff. I have stumbled over some
    strange problems, and would be grateful if anyone on this list could
    offer some help.
    My problem is: I have implemented a class which takes care of building a
    matrix representation of a corpus, and performs SVD over the
    term-by-document matrix. Most of the operations are done by the Jama
    class "Matrix". This works fine, except for the fact that when I ran
    the program over various small test corpora (like, for instance, the one
    from Chapter 15 in Schütze and Manning´s book Foundations of Statistical
    NLP) most of the righ and left singular vectors contained the correct
    values but with wrong/reversed sign?! E.g. a vector that should have the
    values [-0.75,-0.28,-0.20, ...] are assigned the values [0.75,0.28,
    ...]. Unfortunately, I have limited experience with linear algebra and
    the like so now I find myself completely at loss in debugging this...
    As far as I can understand, this means that my vectors are pointing in
    the opposite direction from the one they should, but why this is escapes
    my understanding :)
    Any help, hints, tricks and the like are extremely welcome! I can also
    send over the source code on request.

    Regards,

    --
    Cecilie D. Widsteen
    Department of Linguistics
    University of Oslo
    



    This archive was generated by hypermail 2b29 : Fri May 05 2006 - 13:44:25 MET DST