RE: [Corpora-List] Spanish - common words

From: Mark Davies (Mark_Davies@byu.edu)
Date: Wed Feb 04 2004 - 17:36:45 MET

  • Next message: Prof. Shlomo Argamon: "[Corpora-List] Parallel fiction/essay corpus?"

    > I'm looking for a list (2000-5000 entries) of the most common
    > words in Spanish (standard written language). Preferably -
    > from the newspaper articles domain, but other sources would
    > do as well.

    Currently the best source is:
    -- Juilland, A. & Chang-Rodríguez, E. (1964). Frequency Dictionary of Spanish Words. The Hague: Mouton.
    It's based on about one million words of literature, nearly all from Spain.

    As far as improvements on this dictionary, I'm finishing up a frequency dictionary of Spanish that will be published next year by Routledge. It is based on 20 million words of text, evenly divided between spoken, fiction, and non-fiction (including newspapers). Right now I'm finishing up the tagging and lemmatization of the corpus.

    If you want some preliminary data from this dictionary, check out my Corpus del Español (www.corpusdelespanol.org). Limit your searches to [+19misc] (the newspaper and encyclopedia register) and search by part of speech [*.v_inf], [*.adj], or whatever. It should give you some good rank-frequency listings.

    Good luck,

    Mark Davies

    =================================================
    Mark Davies
    Assoc. Prof., Linguistics
    Brigham Young University
    (phone) 801-422-9168 / (fax) 801-422-0906
    http://davies-linguistics.byu.edu

    ** Corpus design and use // Web-database scripting **
    ** Historical linguistics // Functional-typological grammar **
    ** Spanish and Portuguese historical and dialectal syntax **
    =================================================



    This archive was generated by hypermail 2b29 : Wed Feb 04 2004 - 17:50:29 MET