Re: [Corpora-List] calculation problem

From: Alexander Osherenko (osherenko@gmx.de)
Date: Fri Oct 21 2005 - 11:20:28 MET DST

  • Next message: Philip Resnik: "Re: [Corpora-List] Seeking new results on WSD in applications"

    Dear Marco,

    I tried to give the simplest explanation.

    You say "Bad sampling" is a problem. I don't argue, but in bootstrapping you
    must make some considerations if you want to get further. Such
    considerations are - Sampling is good, I take the simplest distribution and
    calculate the results.

    If you are not satisfied with system results (actually also a problem - what
    can be considered to be a good measure of system quality?) you can always
    choose another distribution and increase amount of samples.

    Cheers,

    Alexander

    P.S. BTW, I don't think that Helene wanted a thorough mathematical
    explanation of her case.

    > --- Ursprüngliche Nachricht ---
    > Von: Marco Baroni <baroni@sslmit.unibo.it>
    > An: Alexander Osherenko <osherenko@gmx.de>
    > Kopie: CORPORA@UIB.NO
    > Betreff: Re: [Corpora-List] calculation problem
    > Datum: Thu, 20 Oct 2005 19:20:41 +0200
    >
    > Dear Alexander,
    >
    > I'm a bit confused...
    >
    > > if you assume that occurences in your corpus are distributed uniformly
    > > (actually the simplest probability distribution ever), you can take this
    > 100
    > > number
    > >
    > > Otherwise, if you use another distribution that better describes
    > behaviour
    > > of the occurences it will influence the number of occurences in the 1
    > > million corpus and will be probably not 100.
    > >
    >
    > Isn't the problem rather one of (non-random) sampling, and not a matter
    > of
    > the assumed distribution (which, as far as I can tell, is not assumed to
    > be
    > uniform)?
    >
    > Regards,
    >
    > Marco
    >
    >
    >

    -- 
    Telefonieren Sie schon oder sparen Sie noch?
    NEU: GMX Phone_Flat http://www.gmx.net/de/go/telefonie
    



    This archive was generated by hypermail 2b29 : Fri Oct 21 2005 - 11:47:26 MET DST