I have an internet based search interface for grammatically annotated
corpora at http://corp.hum.sdu.dk, covering Germanic (en, de, da) and
Romance languages (pt, es, fr, eo). The Portuguese Público corpus (from
Linguateca) has been experimentally PALAVRAS-annotated with NE tags, on
top of PoS and syntax. For the time being though, only the latter can be
searched through a menu - NE tags have to be written in the 'extra'
field, or cqp-style: [extra="civ"], making them accessible only to
people who know they are there ... Maybe [extra="civ" & pos="PROP"],
since nouns (N) have semantic categories, too. There is also a
flash-film showing how to use the interface - as well as, of course,
cqp-documentation on the internet.
The major categories are:
hum = person names
org = organisation names
top = natural place names
occ = event, occasion
title = work of art
brand = brand names, things
and the hybrid categories:
civ = civitas (countries, towns etc.): +HUM, +LOC
media = newspapers etc.: +org, +title
Some articles on Danish and Portuguese NE are listed and/or available at
http://beta.visl.sdu.dk/~eckhard/Artikeloversigt.html. Hope it's useful!
Eckhard Bick
Thamar Solorio wrote:
> Hi!
> I've been searching for portuguese corpora annotated with Named
> Entities. So far I've only found raw corpora and portals to portuguese
> analyzers such as the one from the VISL project, but it is only for
> online use and it does not provide NE classification.
> So, if anyone knows of an available portuguese corpus tagged with NE
> I'll appreciate if you let me know.
> Thanks!
> Thamar Solorio
> Coord. Ciencias Computacionales
> Instituto Nacional de Astrofísica, Óptica y Electrónica
> Luis Enrique Erro #1, Tonantzintla, Puebla
> México
> http://ccc.inaoep.mx/~thamy
-- Eckhard Bick, cand.med., dr.phil. Southern Denmark University e-mail: lineb@hum.au.dk web: http://beta.visl.sdu.dk
This archive was generated by hypermail 2b29 : Mon Sep 13 2004 - 08:51:11 MET DST