[Corpora-List] Re: Chinese-English-Russian parallel corpora:

From: Philip Resnik (resnik@umiacs.umd.edu)
Date: Mon Jan 30 2006 - 16:44:28 MET

  • Next message: Jiangping Chen: "[Corpora-List] Re: Chinese-English-Russian parallel corpora:"

    "Olga Mitrofanova" <alkonost@OM12520.spb.edu> wrote:
    > Here is a summary of useful links concerning Chinese-English-Russian =
    > parallel corpora prepared by Inna Lazareva (St-Petersburg University):

    Here are three more resources that might be of interest for those
    interested in Chinese-English parallel text:

    - The Linguist's Search Engine (http://lse.umiacs.umd.edu) provides
      access to a collection of over 118,000 Chinese pages. These were
      mined automatically from the Web using a technique that
      automatically finds Chinese-English page pairs, which means that the
      English translation is also available when you look at a Chinese
      result. To search Chinese collection, go to "Query Options", and
      under "Collection to Search", select "Public Collection:
      chinese_web"; then, under "Example Sentence", change "Language" from
      English to Chinese. To see the corresponding English for a hit,
      click "Annotation".

      The LSE Web page has links to detailed documentation. Note that the
      Chinese pages have also been automatically classified as to level of
      document difficulty, and this "Level" can be used to narrow the
      search.

    - The Linguist's Search Engine also provides English search of the
      Bible (in modern English translation). When you click "Annotation"
      for a result, it shows the corresponding verse in dozens of other
      languages, including Chinese.

    - For a collection of over 500,000 Chinese-English Web page pairs,
      mined automatically, see http://umiacs.umd.edu/~resnik/strand/ under
      the "English-Chinese (July 2003)" link. A heavily filtered version
      of this collection was used to create the LSE's chinese_web
      collection, above.

    Hope this is helpful!

      Philip

      ----------------------------------------------------------------
      Philip Resnik, Associate Professor
      Department of Linguistics and Institute for Advanced Computer Studies

      1401 Marie Mount Hall UMIACS phone: (301) 405-6760
      University of Maryland Linguistics phone: (301) 405-8903
      College Park, MD 20742 USA Fax: (301) 314-2644 / (301) 405-7104
      http://umiacs.umd.edu/~resnik E-mail: resnik@umiacs.umd.edu



    This archive was generated by hypermail 2b29 : Mon Jan 30 2006 - 17:31:59 MET