[Corpora-List] Release of the FetchProt Corpus

From: Kristofer Franzén (franzen@sics.se)
Date: Fri Sep 23 2005 - 16:32:07 MET DST

  • Next message: Joerg Tiedemann: "Re: [Corpora-List] Search tool for XCES-encoded parallel corpora?"

    Dear colleagues,

    I am pleased to announce the first release of the FetchProt corpus.
    It is based on 177 full text journal articles from the biological domain
    analyzed for experiments on proteins to validate tyrosine kinase activity.
    The 177 filled template files contain 591 experiments on wild types and
    82 different mutants of 77 proteins.
    Apart from the template files the corpus includes text versions of the
    articles with the analyzed content tagged, as reference to where in the
    article the information in the template is to be found.
    The proteins and experiments are, among other things, linked to UniProt
    identity codes, and Gene Ontology molecular function codes.

    The corpus has been compiled within the FetchProt project, a
    collaboration between Swedish Institute of Computer Science (SICS),
    Center for Genomics and Bioinformatics at Karolinska Institutet (CGB/KI)
    and Metamatrix AB, and has received partial funding from VINNOVA, the
    Swedish Agency for Innovation Systems.
    The aim of the project is to build a system that aids in populating the
    EXProt database of proteins with experimentally verified functions, by
    means of information extraction from full text scientific journal papers.

    More information on the corpus and its analysis can be found in the
    documentation at
    http://fetchprot.sics.se/Corpus/Release20050923/FetchProtCorpusDocumentation1.0.pdf

    The corpus is free to download from the project homepage at
    http://fetchprot.sics.se/

    Best regards,

    Kristofer Franzén
    Swedish Institute of Computer Science



    This archive was generated by hypermail 2b29 : Fri Sep 23 2005 - 17:40:01 MET DST