Re: [Corpora-List] calculation problem

From: Alexander Osherenko (osherenko@gmx.de)
Date: Thu Oct 20 2005 - 15:36:14 MET DST

  • Next message: Marco Baroni: "Re: [Corpora-List] calculation problem"

    Hello Helene,

    if you assume that occurences in your corpus are distributed uniformly
    (actually the simplest probability distribution ever), you can take this 100
    number

    Otherwise, if you use another distribution that better describes behaviour
    of the occurences it will influence the number of occurences in the 1
    million corpus and will be probably not 100.

    Cheers,

    Alexander

    > --- Ursprüngliche Nachricht ---
    > Von: "STENGERS, Helene" <Helene.Stengers@ehb.be>
    > An: CORPORA@UIB.NO
    > Betreff: [Corpora-List] calculation problem
    > Datum: Wed, 19 Oct 2005 14:14:55 +0200 (Romance (zomertijd))
    >
    >
    >
    >
    > Hello dear list members,
    >
    >
    > I have an arithmetic question. If a particular expression occurs let's
    > say 500 times in a 5 million word corpus, can I assume that there will
    > be 100 of these expressions in a one million corpus or is there a
    > statistical (probability)formula which I should apply?
    >
    > Cheers,
    >
    > Helene Stengers
    >
    >

    -- 
    10 GB Mailbox, 100 FreeSMS/Monat http://www.gmx.net/de/go/topmail
    +++ GMX - die erste Adresse für Mail, Message, More +++
    



    This archive was generated by hypermail 2b29 : Thu Oct 20 2005 - 15:42:09 MET DST