Re: [Corpora-List] Resources concerning multilabel problem

From: radev@umich.edu
Date: Fri Aug 18 2006 - 15:42:58 MET DST

  • Next message: Andy Roberts: "Re: [Corpora-List] license question"

    Look at this paper:

    http://citeseer.ist.psu.edu/8956.html

    Error-Correcting Output Coding for Text Classification (1999)
    Adam Berger

    See also:

    http://citeseer.ist.psu.edu/19268.html

    > -----Original Message-----
    > From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    > Behalf Of Cecilie Desiree Widsteen
    > Sent: 18 August 2006 11:09
    > To: Corpora list
    > Subject: [Corpora-List] Resources concerning multilabel problem
    >
    >
    >
    > Hello all!
    >
    >
    >
    > I am looking for resources (articles, books, webpages) concerning the
    >
    > multilabel (multiclass?) problem in the context of text classification.
    >
    > By this I mean the fact that a document can be classified into more than
    >
    > one category. Especially w.r.t. supervised learning algorithms, where
    >
    > the documents in the training set may belong to multiple classes.
    >
    >
    >
    > Regards,
    >
    > --
    >
    > Cecilie Widsteen
    >
    > Institute for Informatics,
    >
    > University of Oslo
    >
    >
    > --Boundary_(ID_Bwp/m+5eLKSGnHqnO7FgzA)
    > Content-type: text/html; charset=us-ascii
    > Content-transfer-encoding: 7BIT
    >
    > <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">
    >
    > <head>
    > <meta http-equiv=Content-Type content="text/html; charset=us-ascii">
    > <meta name=Generator content="Microsoft Word 11 (filtered medium)">
    > <style>
    > <!--
    > /* Font Definitions */
    > @font-face
    > {font-family:"Comic Sans MS";
    > panose-1:3 15 7 2 3 3 2 2 2 4;}
    > @font-face
    > {font-family:Verdana;
    > panose-1:2 11 6 4 3 5 4 4 2 4;}
    > /* Style Definitions */
    > p.MsoNormal, li.MsoNormal, div.MsoNormal
    > {margin:0cm;
    > margin-bottom:.0001pt;
    > font-size:12.0pt;
    > font-family:"Times New Roman";}
    > a:link, span.MsoHyperlink
    > {color:blue;
    > text-decoration:underline;}
    > a:visited, span.MsoHyperlinkFollowed
    > {color:purple;
    > text-decoration:underline;}
    > p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
    > {margin:0cm;
    > margin-bottom:.0001pt;
    > font-size:10.0pt;
    > font-family:Arial;
    > color:navy;}
    > p
    > {mso-margin-top-alt:auto;
    > margin-right:0cm;
    > mso-margin-bottom-alt:auto;
    > margin-left:0cm;
    > font-size:12.0pt;
    > font-family:"Times New Roman";}
    > span.EmailStyle18
    > {mso-style-type:personal;
    > font-family:Arial;
    > color:windowtext;}
    > @page Section1
    > {size:595.3pt 841.9pt;
    > margin:72.0pt 107.65pt 72.0pt 107.65pt;}
    > div.Section1
    > {page:Section1;}
    > -->
    > </style>
    >
    > </head>
    >
    > <body lang=EN-GB link=blue vlink=purple>
    >
    > <div class=Section1>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>Dear Cecilie,<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>We have recently made available the JRC-Acquis corpus,
    > which is a multilingual (21 languages) document collection multi-labelled according
    > to the Eurovoc thesaurus and aligned at paragraph level for each of the 210
    > language pairs. You find it for download at:<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a
    > href="http://langtech.jrc.it/JRC-Acquis.html">http://langtech.jrc.it/JRC-Acquis.html><o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>Furthermore, in the &#8216;Publications&#8217; section
    > of our web site (<a href="
    http://langtech.jrc.it/#Publications">http://langtech.jrc.it/#Publications>),
    > you find a number of papers on (typically multilingual) multi-label text
    > categorisation applications (look mainly around the years 2002-2004), including
    > the following:<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p style='margin-left:36.0pt;text-align:justify'><font size=2 face=Verdana><span
    > style='font-size:10.0pt;font-family:Verdana'>Pouliquen Bruno, Ralf Steinberger
    > &amp; Camelia Ignat (2003)</span></font><font size=2 face=Verdana><span
    > style='font-size:10.0pt;font-family:Verdana'>. <i><span style='font-style:italic'><a
    > href="
    http://langtech.jrc.it/Documents/EuroLan-03_Pouliquen-Steinberger-et-al.pdf">Automatic
    > Annotation of Multilingual Text Collections with a Conceptual Thesaurus</a></span></i>.
    > In: Proceedings of the Workshop <i><span style='font-style:italic'>Ontologies
    > and Information Extraction</span></i> at the Summer School <i><span
    > style='font-style:italic'>The Semantic Web and Language Technology - Its
    > Potential and Practicalities</span></i> (EUROLAN'2003). Bucharest, Romania, 28
    > July - 8 August 2003. </span></font><font face=Verdana><span style='font-family:
    > Verdana'><o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>The text categorisation approach described in that
    > paper is used as the major ingredient in our daily news analysis system
    > NewsExplorer (freely accessible at <a href="http://press.jrc.it/NewsExplorer">http://press.jrc.it/NewsExplorer>)
    > to link related news across languages.<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>I hope this helps. All the best,<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>Ralf<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
    > font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <div style='mso-element:para-border-div;border:none;border-top:solid windowtext 1.0pt;
    > padding:1.0pt 0cm 0cm 0cm'>
    >
    > <p class=MsoNormal style='border:none;padding:0cm'><b><font size=1
    > color=maroon face="Comic Sans MS"><span style='font-size:8.0pt;font-family:
    > "Comic Sans MS";color:maroon;font-weight:bold'>Ralf Steinberger</span></font></b><font
    > size=1 color=maroon face="Comic Sans MS"><span style='font-size:8.0pt;
    > font-family:"Comic Sans MS";color:maroon'> (</span></font><font size=1
    > color=maroon face="Comic Sans MS"><span lang=DE style='font-size:8.0pt;
    > font-family:"Comic Sans MS";color:maroon'><a
    > href="mailto:
    Ralf.Steinberger@jrc.it" title="mailto:Ralf.Steinberger@jrc.it"><font
    > color=maroon><span lang=EN-GB style='color:maroon'><span
    > title="mailto:Ralf.Steinberger@jrc.it"><span
    > title="mailto:Ralf.Steinberger@jrc.it">Ralf.Steinberger@jrc.it</span></span></span></font></a></span></font><font
    > size=1 color=maroon face="Comic Sans MS"><span style='font-size:7.5pt;
    > font-family:"Comic Sans MS";color:maroon'>) <br>
    > European Commission - Joint Research Centre (JRC)<br>
    > IPSC&nbsp;- SeS&nbsp;- Language Technology (</span></font><font size=1
    > color=maroon face="Comic Sans MS"><span style='font-size:8.0pt;font-family:
    > "Comic Sans MS";color:maroon'><a href="http://langtech.jrc.it/"
    > title="http://www.jrc.it/langtech"><font size=1 color=maroon><span
    > style='font-size:7.5pt;color:maroon'><span title="http://www.jrc.it/langtech"><span
    > title="http://www.jrc.it/langtech">http://langtech.jrc.it></span></span></font></a>,
    > <a href="
    http://press.jrc.it/NewsExplorer/" title="http://www.jrc.it/langtech"><font
    > size=1 color=maroon><span style='font-size:7.5pt;color:maroon'><span
    > title="http://www.jrc.it/langtech"><span title="http://www.jrc.it/langtech">http://press.jrc.it/NewsExplorer></span></span></font></a></span></font><font
    > size=1 color=maroon face="Comic Sans MS"><span style='font-size:7.5pt;
    > font-family:"Comic Sans MS";color:maroon'>)&nbsp;<br>
    > T.P. 267, Via Fermi 1<br>
    > 21020 Ispra (VA), <U1:COUNTRY-REGION u2:st="on"><U1:PLACE u2:st="on">Italy<br>
    > </U1:PLACE></U1:COUNTRY-REGION>Tel: +39 0332 78-6271<br>
    > Fax: +39 0332 78-5154<br>
    > Secretary: +39 0332 78-5648&nbsp;or 9478</span></font><o:p></o:p></p>
    >
    > </div>
    >
    > <p class=MsoNormal><font size=1 face="Comic Sans MS"><span style='font-size:
    > 8.0pt;font-family:"Comic Sans MS"'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoNormal><b><font size=1 color=red face="Comic Sans MS"><span
    > style='font-size:8.0pt;font-family:"Comic Sans MS";color:red;font-weight:bold'>New
    > URL:</span></font></b><font size=1 face="Comic Sans MS"><span style='font-size:
    > 8.0pt;font-family:"Comic Sans MS"'> <a href="
    http://langtech.jrc.it/">http://langtech.jrc.it>.
    > The previous address
    http://www.jrc.it/langtech will only be valid for a few
    > more months.<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>&nbsp;<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span lang=EN-US
    > style='font-size:10.0pt'>-----Original Message-----<br>
    > From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On Behalf
    > Of Cecilie Desiree Widsteen<br>
    > Sent: 18 August 2006 11:09<br>
    > To: Corpora list<br>
    > Subject: [Corpora-List] Resources concerning multilabel problem</span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>Hello all!<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>I am looking for resources (articles, books, webpages)
    > concerning the <o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>multilabel (multiclass?) problem in the context of
    > text classification. <o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>By this I mean the fact that a document can be
    > classified into more than <o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>one category. Especially w.r.t. supervised learning
    > algorithms, where <o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>the documents in the training set may belong to
    > multiple classes.<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'><o:p>&nbsp;</o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>Regards,<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>--<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>Cecilie Widsteen<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>Institute for Informatics,<o:p></o:p></span></font></p>
    >
    > <p class=MsoPlainText><font size=2 color=navy face=Arial><span
    > style='font-size:10.0pt'>University of Oslo<o:p></o:p></span></font></p>
    >
    > </div>
    >
    > </body>
    >
    > </html>
    >
    > --Boundary_(ID_Bwp/m+5eLKSGnHqnO7FgzA)--
    >
    >
    >

    -- 
    Dragomir R. Radev                                         radev@umich.edu
    Associate Professor of Information, Electrical Engineering and
    Computer Science, and Linguistics, the University of Michigan, Ann Arbor
    Phone: 734-615-5225   Fax: 734-764-2475    http://www.si.umich.edu/~radev
    



    This archive was generated by hypermail 2b29 : Fri Aug 18 2006 - 15:40:59 MET DST