Re: Corpora: Italian corpus

From: Philip Resnik (resnik@umiacs.umd.edu)
Date: Wed Feb 21 2001 - 14:08:13 MET

  • Next message: Amy Isard: "Corpora: Workshop: XML Technologies for Linguistic Data"

    > I'm looking for a large corpus of Italian, and also either a smaller
    > POS-tagged corpus or a lemmatiser/POS-tagger. I'm planning to
    > participate in the Senseval-2 word sense disambiguation competition for
    > Italian and these are the resources that our system needs.

    For taggers, have a look at

      http://www.comp.lancs.ac.uk/computing/research/ucrel/public/1610.html

    which summarizes the replies to the same query a year or two ago. I'm
    sure some things have changed, but I know that the Italian treetagger
    is still available.

    Regarding corpora, I've been collecting a corpus of English-Italian
    pairs of translated Web pages. I'll post to the list when I have
    something to make available, which I hope will be quite soon, and the
    information will be available at http://umiacs.umd.edu/~resnik/strand.

      Philip

      ----------------------------------------------------------------
      Philip Resnik, Assistant Professor
      Department of Linguistics and Institute for Advanced Computer Studies

      1401 Marie Mount Hall UMIACS phone: (301) 405-6760
      University of Maryland Linguistics phone: (301) 405-8903
      College Park, MD 20742 USA Fax : (301) 405-7104
      http://umiacs.umd.edu/~resnik E-mail: resnik@umiacs.umd.edu



    This archive was generated by hypermail 2b29 : Wed Feb 21 2001 - 21:51:17 MET