[Corpora-List] Re: Chinese-English-Russian parallel corpora:

From: Jiangping Chen (jpchen@unt.edu)
Date: Mon Jan 30 2006 - 17:51:34 MET

  • Next message: yuste@ifi.unizh.ch: "[Corpora-List] Final CfP: LR4Trans-III Workshop in association with LREC 2006"

    Thanks a lot for sharing these resources. Jiangping
     
    Jiangping Chen, Ph.D.
    Assistant Professor
    School of Library and Information Sciences
    University of North Texas
    P.O. Box 311068
    Denton, TX 76203
    Phone: (940) 369-8393
    Fax: (940) 565-3101

    >>> Philip Resnik <resnik@umiacs.umd.edu> 01/30/06 9:44 AM >>>

    "Olga Mitrofanova" <alkonost@OM12520.spb.edu> wrote:
    > Here is a summary of useful links concerning Chinese-English-Russian
    =
    > parallel corpora prepared by Inna Lazareva (St-Petersburg
    University):

    Here are three more resources that might be of interest for those
    interested in Chinese-English parallel text:

    - The Linguist's Search Engine (http://lse.umiacs.umd.edu) provides
      access to a collection of over 118,000 Chinese pages. These were
      mined automatically from the Web using a technique that
      automatically finds Chinese-English page pairs, which means that the
      English translation is also available when you look at a Chinese
      result. To search Chinese collection, go to "Query Options", and
      under "Collection to Search", select "Public Collection:
      chinese_web"; then, under "Example Sentence", change "Language" from
      English to Chinese. To see the corresponding English for a hit,
      click "Annotation".

      The LSE Web page has links to detailed documentation. Note that the
      Chinese pages have also been automatically classified as to level of
      document difficulty, and this "Level" can be used to narrow the
      search.

    - The Linguist's Search Engine also provides English search of the
      Bible (in modern English translation). When you click "Annotation"
      for a result, it shows the corresponding verse in dozens of other
      languages, including Chinese.

    - For a collection of over 500,000 Chinese-English Web page pairs,
      mined automatically, see http://umiacs.umd.edu/~resnik/strand/ under
      the "English-Chinese (July 2003)" link. A heavily filtered version
      of this collection was used to create the LSE's chinese_web
      collection, above.

    Hope this is helpful!

      Philip

      ----------------------------------------------------------------
      Philip Resnik, Associate Professor
      Department of Linguistics and Institute for Advanced Computer
    Studies

      1401 Marie Mount Hall UMIACS phone: (301) 405-6760
      University of Maryland Linguistics phone: (301) 405-8903
      College Park, MD 20742 USA Fax: (301) 314-2644 / (301)
    405-7104
      http://umiacs.umd.edu/~resnik E-mail: resnik@umiacs.umd.edu



    This archive was generated by hypermail 2b29 : Mon Jan 30 2006 - 18:22:54 MET