Re: [Corpora-List] A question About Chomsky normal form

From: Miles Osborne (miles@inf.ed.ac.uk)
Date: Sun Sep 21 2003 - 12:14:10 MET DST

  • Next message: P bI K O B B.B. (M: "[Corpora-List] Russian Corpora Available"

    you need to be more clear how the inside-outside algorithm failed. if you mean
    that you experienced underflow problems, then a standard approach is to use
    logarithms and renormalise when necessary. if you mean that the image grew too
    large, or else it takes too long to converge, then you might be able to tie
    rules together (group them into equivalence classes). or, you might be able to
    use a monte carlo simulation (since the inside-outside algorithm computes
    expectations, an mc simulation could approximate such expectations). or, you
    might be able to only partially compute the expectation step, or perhaps not
    fully maximise at each round. a paper describing this is:

    A View of the EM Algorithm that Justifies Incremental, Sparse, and Other
    Variants (Radford Neal and Geoffrey Hinton)

    http://www.cs.toronto.edu/~radford/em.abstract.html

    it strikes me that people ought to be more interested in scaling-up our machine
    learning / statistical inference methods. why not take this chance to see how
    you can scale the inside-outside algorithm? don't forget to tell us about it ...

    Miles

    Quoting Heshaam Feili <hfaili@mehr.sharif.edu>:

    > Dear Colleguese,
    > I need a relateively large bracketed data set with CNF format to test it
    > on
    > Algorithms like inside-outside (lari and young 1990). I choosed NEGRA
    > data
    > set and trying to change it to CNF format.
    > After the changing to CNF format, a lot of non-terminals will be
    > created
    > because of binarization ... so the algorithm (inside-outside) failed.
    > What can I do in order to overcome this problem?
    > Best
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Sun Sep 21 2003 - 12:20:46 MET DST