RE: [Corpora-List] Resources concerning multilabel problem

From: Ralf Steinberger (ralf.steinberger@jrc.it)
Date: Fri Aug 18 2006 - 14:32:18 MET DST

  • Next message: radev@umich.edu: "Re: [Corpora-List] Resources concerning multilabel problem"

    Dear Cecilie,

     

    We have recently made available the JRC-Acquis corpus, which is a
    multilingual (21 languages) document collection multi-labelled according to
    the Eurovoc thesaurus and aligned at paragraph level for each of the 210
    language pairs. You find it for download at:

     

          http://langtech.jrc.it/JRC-Acquis.html

     

    Furthermore, in the 'Publications' section of our web site
    (http://langtech.jrc.it/#Publications), you find a number of papers on
    (typically multilingual) multi-label text categorisation applications (look
    mainly around the years 2002-2004), including the following:

     

    Pouliquen Bruno, Ralf Steinberger & Camelia Ignat (2003). Automatic
    <http://langtech.jrc.it/Documents/EuroLan-03_Pouliquen-Steinberger-et-al.pdf
    > Annotation of Multilingual Text Collections with a Conceptual Thesaurus.
    In: Proceedings of the Workshop Ontologies and Information Extraction at the
    Summer School The Semantic Web and Language Technology - Its Potential and
    Practicalities (EUROLAN'2003). Bucharest, Romania, 28 July - 8 August 2003.

    The text categorisation approach described in that paper is used as the
    major ingredient in our daily news analysis system NewsExplorer (freely
    accessible at http://press.jrc.it/NewsExplorer) to link related news across
    languages.

     

    I hope this helps. All the best,

     

    Ralf

     

     

     

     

    Ralf Steinberger ( <mailto:Ralf.Steinberger@jrc.it> Ralf.Steinberger@jrc.it)

    European Commission - Joint Research Centre (JRC)
    IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
    http://langtech.jrc.it, <http://press.jrc.it/NewsExplorer/>
    http://press.jrc.it/NewsExplorer)
    T.P. 267, Via Fermi 1
    21020 Ispra (VA), Italy
    Tel: +39 0332 78-6271
    Fax: +39 0332 78-5154
    Secretary: +39 0332 78-5648 or 9478

     

    New URL: http://langtech.jrc.it <http://langtech.jrc.it/> . The previous
    address http://www.jrc.it/langtech will only be valid for a few more months.

     

     

     

     

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Cecilie Desiree Widsteen
    Sent: 18 August 2006 11:09
    To: Corpora list
    Subject: [Corpora-List] Resources concerning multilabel problem

     

    Hello all!

     

    I am looking for resources (articles, books, webpages) concerning the

    multilabel (multiclass?) problem in the context of text classification.

    By this I mean the fact that a document can be classified into more than

    one category. Especially w.r.t. supervised learning algorithms, where

    the documents in the training set may belong to multiple classes.

     

    Regards,

    --
    

    Cecilie Widsteen

    Institute for Informatics,

    University of Oslo



    This archive was generated by hypermail 2b29 : Fri Aug 18 2006 - 15:30:49 MET DST