Dear Serge, Thank you for you answer and kind offer.As for your suggestion of using Wacky, our problem is not so much that of obtaining "newswire" text from the web - because we could in fact obtain that text from the publicly available 14Gb collection of the portuguese web, the WTP03 (please see http://poloxldb.linguateca.pt/index.php?l=WPT_03 ) by using a similar procedure to the one you mentioned, since it is all indexed in a MySQL database - but instead that of obtaining a newswire collection that is manually classified by topic/domain and comparable to the english one. :)I was wandering that there could in fact be one such a collection available, since Reuters is a global news agency and I am sure that they produce a huge number of newswire texts everyday in several languages. Best, LS--- On Wed 11/16, Serge Sharoff < s.sharoff@leeds.ac.uk > wrote:From: Serge Sharoff [mailto: s.sharoff@leeds.ac.uk]To: parapraxe@excite.comCc: corpora@hd.uib.noDate: Wed, 16 Nov 2005
10:57:39 +0000Subject: Re: [Corpora-List] REUTER corpus online?Luis,we have an online interface to the Reuters corpus (indexed byCorpusWorkbench). It's available from:http://corpus.leeds.ac.uk/Because of the agreement with Reuters the access is mostly limited toinhouse research. However, we can provide a password forresearch-related concordancing.As for Portuguese, if you have a reasonable list of words frequent inPortuguese newswires and a tagger/lemmatiser, a corpus like this can becollected from the web. See the Wacky initiative:http://wacky.sslmit.unibo.it/Best wishes,SergeOn Tue, 2005-11-15 at 10:45 -0500, Luis Sarmento wrote:> Dear Corpora-List members,> > > > Does anyone know if there is any publicly available online version of> the reuters corpus? In other words, is there any web concordace tool> (free) for the Reuters Corpus?> > Btw, I wonder if there are comparable versions of the reuters corpus> available, namely in Portuguese, for
bilingual studies. Is anyone> using "comparable" version of reuters in Portuguese?> > Thanks to all,> > > > Lus Sarmento> > > > > -- Dr. Serge SharoffCentre for Translation StudiesSchool of Modern Languages and CulturesUniversity of LeedsLeeds, LS2 9JTtel: +44(0)113 343 7287fax: +44(0)113 343 3287
_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!
This archive was generated by hypermail 2b29 : Thu Nov 17 2005 - 14:55:04 MET