Re: [Corpora-List] RE: Lesser (sic) used languages

From: Matthew Hurst (mhurst@intelliseek.com)
Date: Fri Feb 11 2005 - 19:54:29 MET

  • Next message: Gerhard van Huyssteen: "Re: [Corpora-List] Morphological Analyser for Setswana"

    A corpus of 8bn documents give the following numbers:

    less known - 270k
    lesser known - 1, 160k

    less used - 107 k
    lesser used - 52.5 k

    Matt Hurst

    Nancy Ide wrote:
    >
    > On Feb 10, 2005, at 7:16 PM, Somers, Harold wrote:
    >
    >
    >>That's funny. My personal reaction to any marginally acceptable
    >>collocation that I personally don't use is that it's American ;-)
    >>
    >>
    >>
    >>
    >
    > ...and the Americans take the challenge!
    >
    > The 11m words of the ANC First release include "less used" only once,
    > and "lesser used" does not appear at all. But "lesser known" follows
    > the pattern found in the BNC for "lesser used":
    >
    > the lesser known people
    > the lesser known though no less captivating part of the Lake District .
    > lesser known but high - caliber musicians
    > lesser known but accomplished singers
    > lesser known works
    > the lesser known of the two
    >
    > But we do have one attributive use of "less known":
    >
    > a less known and common signification
    >
    > Go figure (as we Americans say). When our next 10m words are ready in a
    > month or two, we'll have another look.
    >
    > Nancy Ide
    >
    > =======================================================
    >
    > Nancy Ide
    >
    > Professor of Computer Science
    > Vassar College
    > Poughkeepsie, NY 12604-0520 USA
    > Tel: +1 845 437-5988 Fax: +1 845 437-7498
    > ide@cs.vassar.edu
    >
    > Chercheur Associe
    > Equipe Langue et Dialogue, LORIA/CNRS
    > Campus Scientifique - BP 239
    > 54506 Vandoeuvre-les-Nancy FRANCE
    > Tel: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79
    > ide@loria.fr
    >
    > =======================================================
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Fri Feb 11 2005 - 20:06:34 MET