[Corpora-List] Search tool for XCES-encoded parallel corpora?

From: Mickel Grönroos (mickel.gronroos@masterin.com)
Date: Fri Sep 23 2005 - 14:11:22 MET DST

  • Next message: Lars Nygaard: "Re: [Corpora-List] Search tool for XCES-encoded parallel corpora?"

    Hello!

    I am looking for a corpus search tool that could be used for querying a
    parallel corpus tagged in XCES format. All operating systems and programming
    languages will do. Does anybody now if such a tool exists or do I need to
    code it myself?

    Basically what I want to be able to do is say something like: "Look for the
    word X in language A using my set of sentence align files N. Show me all
    sentences in language A and language B where where X occurs."

    What I have is three files, one file with the text in language A, another
    with the text in language B and finally an file with the alignment markup
    aligning the A sentences with the B sentences.

    This is what it looks like:

    exampledoc_A.xml:
    [...]
    <p id="p1">
      <s id="p1s1">Aktia nostaa Prime-korkoaan.</s>
      <s id="p1s2">Aktia Säästöpankki Oyj:n johtoryhmä on tänään päättänyt
    nostaa Prime-korkoa 0,5 prosenttiyksiköllä.</s>
    </p>
    [...]

    exampledoc_B.xml:
    [...]
    <p id="p1">
      <s id="p1s1">Aktia höjer sin Prime-ränta.</s>
      <s id="p1s2">Aktia Sparbank Abp:s ledningsgrupp har i dag beslutat att
    höja Prime-räntan med 0,5 procentenheter.</s>
      </p>
    [...]

    examplealign.xml:
    [...]
    <translations>
      <translation trans.loc="exampledoc_A.xml" wsd="iso-8859-1" lang="fi"
    xml:lang="fi" n="1" />
      <translation trans.loc="exampledoc_B.xml" wsd="iso-8859-1" lang="sv"
    xml:lang="sv" n="2" />
    </translations>
    [...]
    <linkList>
      <linkGrp targType="s">
        <link>
          <align xlink:href="#p1s1" />
          <align xlink:href="#p1s1" />
        </link>
        <link>
          <align xlink:href="#p1s2" />
          <align xlink:href="#p1s2" />
        </link>
      </linkGrp>
    </linkList>
    [...]

    I want to be able to say:

    xces_search --searchlanguage=sv 'höjer' examplealign.xml

    What I want to get is:
    Aktia höjer sin Prime-ränta.
    Aktia nostaa Prime-korkoaan.

    Any ideas?

    Best regards,

    Mickel Grönroos

    --
    Mickel Grönroos, project manager, mickel.gronroos@masterin.com, +358 9 2517
    4562
    Master's Innovations Ltd., Tekniikantie 14, FIN-02150 Espoo, Finland,
    www.masterin.com
    



    This archive was generated by hypermail 2b29 : Fri Sep 23 2005 - 14:49:57 MET DST