Re: [Corpora-List] Spanish reference corpus

From: Emrah (ozcanemrah@gmail.com)
Date: Fri Feb 02 2007 - 10:36:01 MET

  • Next message: Ditte Kimps: "[Corpora-List] Part-time Language Consultants Wanted"

    Thanks for the Spanish corpus. I have some other questions to ask you all...

       1. Do you know a similar corpus for Turkish (other than METU corpus
       http://www.ii.metu.edu.tr/~corpus/>).

       2. Also I would like to know if you could advise me a morphological
       parser for Turkish that I can find the root and suffix (and even prefix
       which are few like eş- /esh/ eşzamanlı --> synchronous).
       3. An automatic lemmatizer for Turkish...

    Thanks in advance...

    -- 
    EMRAH ÖZCAN (M.A.)
    Araş. Gör.
    

    Yıldız Teknik Üniversitesi / Yildiz Technical University Eğitim Fakültesi / Faculty of Education Yabancı Diller Eğitimi Böl. / Foreign Languages Teaching Dept. Davutpaşa Yerleşkesi / Campus at Davutpasa Esenler, İstanbul Türkiye

    posta: eozcan {@} yildiz.edu.tr telefon: +90 212 449 1616 ağ sayfası: http://www.dil.yildiz.edu.tr/emrah

    On 2/2/07, Serge Sharoff <s.sharoff@leeds.ac.uk> wrote: > > yes, the frequency list is also available: > http://corpus.leeds.ac.uk/frqc/internet-es-forms.num (for word forms) > http://corpus.leeds.ac.uk/frqc/internet-es.num (for lemmas, though you'd > better take the results of automatic lemmatisation with caution). > > BTW, the frequencies (the second column) are in terms of ipm (instances > per million words). > > Serge > > On Thu, 2007-02-01 at 14:17 +0100, Mario Crespo Miguel wrote: > > Thank you very much for helping me, but I think it is more > > convenient for me if the frequencies of the words of this open > > domain / general corpus could be obtained. Does anybody know if > > such an information is available some way? Best, > > > > Mario > > > > > > > > El dia 30 ene 2007 16:10, Serge Sharoff <s.sharoff@leeds.ac.uk> > > escribió: > > > > > one answer is the Spanish Internet corpus with the interface from > > > http://corpus.leeds.ac.uk/internet.html > > > and the URL list > > > http://corpus.leeds.ac.uk/internet/final-url-es.gz > > > > > > This is a random snapshot of the Spanish Internet of about 120 > > > million > > > words, see > > > Sharoff, S (2006) Creating general-purpose corpora using > > > automated > > > search engine queries. In Marco Baroni and Silvia Bernardini, > > > editors, > > > WaCky! Working papers on the Web as Corpus. Gedit, Bologna. > > > http://wackybook.sslmit.unibo.it/ > > > > > > S > > > > > > On Tue, 2007-01-30 at 15:54 +0100, Mario Crespo Miguel wrote: > > >> Dear everybody, > > >> > > >> Thank you again for all the help that I always get with this > > >> mailing list, and this time I would like to ask if there is > > >> some reference / open-domain corpus for Spanish which is freely > > >> available and could be downloaded. Thank you in advance. Best > > >> wishes, > > >> > > >> Mario Crespo Miguel > > >> > > >> > > > > > > > > > > > > > > > >



    This archive was generated by hypermail 2b29 : Fri Feb 02 2007 - 10:33:26 MET