Re: Corpora: Arabic and Natural Language processing

Chris Brew (
Mon, 22 Sep 1997 09:14:40 +0100 (BST)

>Dear Friends,
> I am a M.Sc. in computer science student and a vendor at
>IBM-Egypt. As a student interesting in NLP and as Egyptian his mother tongue
>is Arabic and intersting in it, i found that one of the main obstacles in
>achieving real progress in this field is avaliablity of Electronic Arabic
>I ask every one interested in discussing this subject to a side conversation
>in this topic .
>I havesome ideas i want to share with you.
> Mohamed Farouk Noamany
> Mohamed Farouk Noamany

from the language technology FAQ


Texts/Corpora: No 0050

Index of Key Terms

Can you tell me where to find Arabic texts?

The largest Arabic corpus available is the Al-Hayat 1995 CD (for the Mac).
It has some 140MB of data (about 23M words) in about 44,000 files, all
in Arabic Mac encoding (a superset of ISO 8859-6). It is available from:

Dr. Imad Bachir
Al-Hayat Publishing Company
Kensington Centre
66 Hammersmith Road
+44 (0) 171 602 9988 (Tel);
+44 (0) 171 602 4963 (Fax)

Also, Khalid Choukri ( suggests:

You should contact either Fathi Debili from the French Research Center
(, or Ms. Nadia Hegazi from ERI - CAIRO

Last edited by Colin Matheson, 10-07-97

Address: Language Technology Group, HCRC,
2 Buccleuch Place, Edinburgh EH8 9LW,Scotland
Telephone: +44 131 650 4632 Fax: +44 131 650 4587