We are pleased to announce COMPARA's version 5.0, with over one million
words of English and Portuguese parallel texts.
COMPARA is an extensible bidirectional parallel corpus of English and
Portuguese that is freely accessible at http://www.linguateca.pt/COMPARA/.
The corpus has been continuously improved since its first version back in
2000. Version 5.0 is the result of an extensive revision of the corpus and
The corpus is encoded in the IMS Corpus Workbench system and is searchable
via the DISPARA Web interface. Alignment is based on the source-text
sentence and allows users to search for sentences that have been joined,
split, added to, deleted from, and reordered in translation. Other
searchable features are translators' notes, foreign words, titles, emphasis
and named entities.
Version 5.0 contains 39 aligned text extracts of published fiction by 27
different authors from Angola, Brazil, Mozambique, Portugal, South Africa,
the United Kingdom and the United States, and 25 more texts are in the
New features in COMPARA version 5.0 include:
- all texts have been revised for encoding of single and double quotes
(and made distinct from apostrophes)
- a new semantics was given to the structural markup <foreign>,
<title> and <emph>, and a new category was added, <named> (for named
- a new procedure for sentence definition, regarding the colon, was
- a better and more complete display of the results, as well as of the
corpus overview, was implemented
- an improvement in the random choice of hits to be displayed was
- a new search and display feature was added, that of original vs.
Ana Frankenberg-Garcia & Diana Santos
Diana Santos, Diana.Santos@sintef.no
SINTEF Telecom & Informatics
Pb 124 Blindern, N-0314 Oslo Noruega
This archive was generated by hypermail 2b29 : Mon Nov 10 2003 - 10:49:42 MET