[Corpora-List] the IPI PAN Corpus of Polish

From: Adam Przepiorkowski (adamp@ipipan.waw.pl)
Date: Wed Mar 22 2006 - 23:31:33 MET

  • Next message: Rob Freeman: "Re: [Corpora-List] if + would"

    The 2nd edition of the IPI PAN Corpus of Polish, developed
    at the Institute of Computer Science of the Polish Academy
    of Sciences (PAS), is available at the web pages of:

    - the Institute of Computer Science PAS:
      http://korpus.pl/en/
    - the Institute of Polish Language PAS:
      http://corpus.ijp-pan.krakow.pl/en/

    To the best of our knowledge, this is currently the largest
    searchable morphosyntactically annotated corpus of Polish
    available to the public.

    The whole corpus consists of over 250 million segments
    (about 200 million orthographic words) and it is not
    balanced, but a balanced sample of over 30 million segments
    is also available. These corpora can be directly searched
    at the above addresses (do read the query syntax cheatsheet
    at http://korpus.pl/en/cheatsheet/index.html) or downloaded
    in a binary form to be used with a standalone version of the
    corpus search engine Poliqarp (announced separately on the
    'corpora' list). Note that the standalone Poliqarp offers
    much greater functionality than the web interface (e.g., it
    shows metadata, presents more results, etc.).

    Best regards,

    Adam P.

    -- 
    Adam Przepiorkowski
    http://nlp.ipipan.waw.pl/ ----- Linguistic Engineering Group
    http://korpus.pl/ ------------- the IPI PAN Corpus of Polish
    



    This archive was generated by hypermail 2b29 : Thu Mar 23 2006 - 00:11:07 MET