Re: [Corpora-List] Brown Corpus

From: Jean Veronis (Jean.Veronis@up.univ-mrs.fr)
Date: Fri Jun 17 2005 - 17:03:59 MET DST

  • Next message: Babis Theodoulidis: "[Corpora-List] RANLP Workshop - Text Mining Research, Practice and Opportunities"

    Hi Adam,

    Although I agree on the same-size sample design, I am less convinced by
    the use of the mean and standard deviation on corpora (as well as
    t-score and a few others). The distributions are so strongly skewed that
    these measures are probably not advisable. Without getting into anything
    too complicated, the median and measures based on it, like the MAD (mean
    absolute deviation), and in general what's called "robust statistics",
    seem preferable to me.

    --j
      http://aixtal.blogspot.com



    This archive was generated by hypermail 2b29 : Fri Jun 17 2005 - 17:07:18 MET DST