Corpora: corpora history revisited - Bible corpus

Jeff ALLEN (jeff@elda.fr)
Tue, 02 Mar 1999 16:57:35 +0100

rykov@iling.msk.su (Vladimir Rykov) wrote:
>> Had anybody heard about any research or reference about Bible as
>> a special corpus of texts?

Mari Broman Olsen wrote:
>Philip Resnik, Mona Diab and I have an article coming out in
>Computers and the Humanities describing our growing collection of
>Bibles and their TEI-compliant annotation. In that article we suggest
>several uses for the Bible as a multilingual parallel corpus,
>including a 'seed' translation lexicon for Machine Translation.

Abstract is available at:
http://www.stg.brown.edu/webs/tei10/tei10.papers/resnik/node1.html

On-line version of the full paper is available at:
http://www.stg.brown.edu/webs/tei10/tei10.papers/resnik.html

--

Also check out The Polyglot Bible project by Mark Davies at: http://138.87.135.33/bible/ Mark Davies mdavies@ilstu.edu

---

See the ARTFL Project: Multi-lingual Bibles http://estragon.uchicago.edu/Bibles/ http://estragon.uchicago.edu/Bibles/BIBLES.FAQ.html

---

The following is a site containing several on-line Bibles (that have passed the copyright period).

http://www.cbc.bryan.tx.us/onlinebible.html

Colleagues and I at the Center for Machine Translation used this site (legally) to download multilingual versions of the Bible. It took about half a day to convert, prepare and manually verify a bilingual interlinear text for one language pair. Worked very well for English / Haitian Creole for the Machine Translation project we were working on. I think we also created bilingual corpora for English/Croatian and English/Korean using this method. I cannot remember if we did it for English/French and English/Spanish.

-- 

See what SIL is doing. This is their monthly newsletter "Notes on Computing".

http://www.sil.org/computing/noc/

See more specifically what Gary Simons has been investigating: http://www.sil.org/computing/noc/Vol13/134academic.htm

Text Encoding Initiative Guidelines—For the past five years Gary Simons and Robin Cover have participated in an international undertaking to develop guidelines for the encoding of textual material in electronic format.

----

Hope that helps.

Jeff

================================================= Jeff ALLEN - Directeur Technique European Language Resources Association (ELRA) & European Language Resources Distribution Agency (ELDA) (Agence Européenne de Distribution des Ressources Linguistiques) 55, rue Brillat-Savarin 75013 Paris FRANCE Tel: (+33) (0) 1.43.13.33.33 - Fax: (+33) (0) 1.43.13.33.30 mailto:jeff@elda.fr http://www.icp.grenet.fr/ELRA/home.html