RE: [Corpora-List] Spanish - common words

From: Mark Davies (
Date: Wed Feb 04 2004 - 17:36:45 MET

  • Next message: Prof. Shlomo Argamon: "[Corpora-List] Parallel fiction/essay corpus?"

    > I'm looking for a list (2000-5000 entries) of the most common
    > words in Spanish (standard written language). Preferably -
    > from the newspaper articles domain, but other sources would
    > do as well.

    Currently the best source is:
    -- Juilland, A. & Chang-Rodríguez, E. (1964). Frequency Dictionary of Spanish Words. The Hague: Mouton.
    It's based on about one million words of literature, nearly all from Spain.

    As far as improvements on this dictionary, I'm finishing up a frequency dictionary of Spanish that will be published next year by Routledge. It is based on 20 million words of text, evenly divided between spoken, fiction, and non-fiction (including newspapers). Right now I'm finishing up the tagging and lemmatization of the corpus.

    If you want some preliminary data from this dictionary, check out my Corpus del Español ( Limit your searches to [+19misc] (the newspaper and encyclopedia register) and search by part of speech [*.v_inf], [*.adj], or whatever. It should give you some good rank-frequency listings.

    Good luck,

    Mark Davies

    Mark Davies
    Assoc. Prof., Linguistics
    Brigham Young University
    (phone) 801-422-9168 / (fax) 801-422-0906

    ** Corpus design and use // Web-database scripting **
    ** Historical linguistics // Functional-typological grammar **
    ** Spanish and Portuguese historical and dialectal syntax **

    This archive was generated by hypermail 2b29 : Wed Feb 04 2004 - 17:50:29 MET