Mickel,
You can use the excellent Corpus Workbench
(http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/). It takes a
bit of work to learn, but it is very flexible and powerful.
To convert your XCES files to CWB format, use attached perl scripts,
from Jörg Tiedemann's equally excellent UPLUG package.
best,
Lars Nygaard,
The Text Laboratory, University of Oslo
Mickel Grönroos wrote:
> Hello!
>
> I am looking for a corpus search tool that could be used for querying a
> parallel corpus tagged in XCES format. All operating systems and programming
> languages will do. Does anybody now if such a tool exists or do I need to
> code it myself?
>
> Basically what I want to be able to do is say something like: "Look for the
> word X in language A using my set of sentence align files N. Show me all
> sentences in language A and language B where where X occurs."
>
> What I have is three files, one file with the text in language A, another
> with the text in language B and finally an file with the alignment markup
> aligning the A sentences with the B sentences.
>
> This is what it looks like:
>
> exampledoc_A.xml:
> [...]
> <p id="p1">
> <s id="p1s1">Aktia nostaa Prime-korkoaan.</s>
> <s id="p1s2">Aktia Säästöpankki Oyj:n johtoryhmä on tänään päättänyt
> nostaa Prime-korkoa 0,5 prosenttiyksiköllä.</s>
> </p>
> [...]
>
> exampledoc_B.xml:
> [...]
> <p id="p1">
> <s id="p1s1">Aktia höjer sin Prime-ränta.</s>
> <s id="p1s2">Aktia Sparbank Abp:s ledningsgrupp har i dag beslutat att
> höja Prime-räntan med 0,5 procentenheter.</s>
> </p>
> [...]
>
> examplealign.xml:
> [...]
> <translations>
> <translation trans.loc="exampledoc_A.xml" wsd="iso-8859-1" lang="fi"
> xml:lang="fi" n="1" />
> <translation trans.loc="exampledoc_B.xml" wsd="iso-8859-1" lang="sv"
> xml:lang="sv" n="2" />
> </translations>
> [...]
> <linkList>
> <linkGrp targType="s">
> <link>
> <align xlink:href="#p1s1" />
> <align xlink:href="#p1s1" />
> </link>
> <link>
> <align xlink:href="#p1s2" />
> <align xlink:href="#p1s2" />
> </link>
> </linkGrp>
> </linkList>
> [...]
>
> I want to be able to say:
>
> xces_search --searchlanguage=sv 'höjer' examplealign.xml
>
> What I want to get is:
> Aktia höjer sin Prime-ränta.
> Aktia nostaa Prime-korkoaan.
>
> Any ideas?
>
> Best regards,
>
> Mickel Grönroos
>
> --
> Mickel Grönroos, project manager, mickel.gronroos@masterin.com, +358 9 2517
> 4562
> Master's Innovations Ltd., Tekniikantie 14, FIN-02150 Espoo, Finland,
> www.masterin.com
>
>
This archive was generated by hypermail 2b29 : Fri Sep 23 2005 - 15:11:55 MET DST