SV: [Corpora-List] Information about content analysis software

From: Santos Diana (Diana.Santos@sintef.no)
Date: Thu Mar 23 2006 - 22:08:59 MET

  • Next message: Cécile Yousfi: "Re: [Corpora-List] if + would"

    Dear Flávio,

    Corpógrafo (www.linguateca.pt/Corpografo), developed by the Porto node of Linguateca (Belinda Maia, Luís Sarmento, Ana Sofia Pinto, Luís Miguel Cabral and others) is a system that processes Portuguese (and several other languages as well) and has a lot of the functions you require.

    It is more general than InXight in that it was designed to discover terms (mainly NPs with a common noun head)
    and not only named entities. However, it has no summarization capabilities.

    Even though it was initially developed for terminology teaching purposes (and we have currently more than 600 users around the world) we are now extending it to encompass functionalities more like the ones you mention, namely making Corpógrafo of help in developing ontologies from text and visualizing them, as well as in semi-automatically discovering definitions.

    See our paper in LREC this year for more information:

    Luís Sarmento, Belinda Maia, Diana Santos, Ana Pinto & Luís Cabral. "Corpógrafo V3: From Terminological Aid to Semi-automatic Knowledge Engine". to appear in Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006 ) (Genoa, Italy, 22-28 May 2006), http://www.linguateca.pt/Diana/download/SarmentoetalLREC2006.pdf

    Gretings,
    Diana
    ---------------
    Diana Santos
    www.linguateca.pt
    Linguateca, Oslo node, SINTEF ICT
    Pb 124 Blindern, N-0314 Oslo, Norway

    ________________________________

    Fra: owner-corpora@lists.uib.no på vegne av Flávio Barbosa
    Sendt: fr 17.03.2006 18:14
    Til: corpora@uib.no
    Emne: [Corpora-List] Information about content analysis software

    My name is Flávio. I work at Research and Documentation Managment in MULTIRIO (www.multirio.rj.gov.br), an entity created by the Municipal Government of Rio de Janeiro with the purpose of enhancing education and cultural understanding by creating, producing and broadcasting information via TV, press and the Web.
    We'd like to get a recommendation of content analysis softwares that satisfy the following needs (we've already foud some options, like Tropes and Inxight, but the research report should present other possibilities):

    1) It should process information in portuguese (and other major languages) --- this is indispensable;
    2) it should process mul timedia material;
    3) it should process files in different text editors formats, as well as pdf, html etc.;
    4) it should summarize automatically text content;
    5) it should process different text extensions (not only words or expressions, but the meaning of larger text extensions);
    6) it should be possible to visualize results graphically, with varied visualization options;
    7) it should br possible to freely create semantic categories for content extraction.

    Thanks for your help. If you know a software that doesn't sati sfy all the necessities above, but the majority of them, We'd also be grateful to have this information.
    -----
    Flávio Barbosa (flaviobarbosa@rio.rj.gov.br), researcher
    MultiRio -- Empresa Municipal de Multimeios
    Research and Documentation Managment
    Phones: 55 21 2528-8258
                  55 21 2528-8244

    ________________________________

    Yahoo! Acesso Grátis
    Internet rápida e grátis. Instale o discador agora! <http://us.rd.yahoo.com/mail/br/tagline/homepage_set/*http://br.acesso.yahoo.com>



    This archive was generated by hypermail 2b29 : Thu Mar 23 2006 - 22:11:20 MET