[Corpora-List] our corpora on world languages

From: Yuri Tambovtsev (yutamb@mail.cis.ru)
Date: Thu Jul 03 2003 - 19:34:08 MET DST

  • Next message: Alexander Clark: "[Corpora-List] PRELIMINARY ANNOUNCEMENT: COLING 2004"

    Dear colleagues, I am sending you all this information in case you
    could publish it in your electronic newsletter. I do hope we might establish
    a joint project. I'd like to tell you about our group of Phonostatistics
    and Typological Studies. It would be very kind of You to let
    me know about Your activities in the field of phonostatistics and
    typology in the West. I planned to attend the conferences in the West
    (for instance in Prague)to renew my contacts or to set up new ones.
    Actually, now that democracy came to Russia, it is harder to travel to
    the West from Novosibirsk than before, since the transportation cost
    more, than before, when every post-graduate student could pay his
    ticket to go to Moscow. Now a Novosibirsk linguist cannot find enough
    money to go even to Moscow. I failed to find a bursary for my trip to
    Prague as well as any other conference in the West.
    This is why your e-mail infromation is of great interest and importance
    to us. In fact, e-mail is the only contact with the colleagues in
    the profession.
    If You happen to inform us about some international conferences on
    phonostatistics, we'd be most grateful. Please,be so kind as to let us
    know. Our group of phonological studies of Siberian, Paleo-Asiatic,
    Uralo-Altaic, Far East, Oceanian languages and some isolated languages
    (Korean, Nivh, Ket, Yukaghir, Japanese) is looking
    forward to establishing close contacts with all the world
    colleagues in these fields of linguistics: typology and
    phonostatistics.Many articles on Siberian, Finno-Ugric, Turkic,
    Mongolian, Tungus-Manchurian and Paleo-Asiatic
    languages could be published on our data. Now our small group is
    working on the texts
    of the 112th language of the world: Dolgan. We have computed the following world
    languages:1. Japanese; 2.Nivh; 3.Ket; (Finno-Ugric):
    4.Mansi(Vogul):Sygva, Sosva, and Konda dialects; 5.Hanty(Osjak): Kazym and Eastern
    dialects; 6. Hungarian; 7.Komi-Zyrian; 8.Udmurt (Votiak); 9. Mari (Che-
    remis): Mountain and Lawn dialects; 10 Mordovian: Erzia and Moksha;
    11 Vepsian; 12. Vodian; 13. Karelian: Tihvin, Livvikov and Ljudikov;
    14. Saami (Lopari); 15. Finnish; (Samoyedic):16. Nganasan; (Turkic):
    17. Azeri (Azerbaidjanian); 18. Tatar: Sibirian-Baraba and Kazan;
    19. Altai (Kizhi);20. Kumandin(Altai); 21.Turkish; 22. Turkmen;
    23. Jakut(Saha); 24.Karakalpak; 25.Kazah; 26. Kirgiz; 27. Tofalar;
    28.Shorian; 29. Dolganian; 30.Hakas; 31.Ujgur; 32.Uzbek; (Tungus-
    Manchurian): 33.Nanai; 34. Negidal; 35. Evenk (Tungus); 36.Even;
    37. Uljch; 38. Orok; 39. Oroch; 40. Nivh; (Mongolian): 41. Mongolian;
    42.Buriatian; 43. Kalmykian; (Slavonic): 44.Russian; 45. Ukrainian;
    46. Belorussian; 47. Sorbian; 48. Serbo-Croatian; (Iranian):
    49. Gilian; 50. Persian (Iranian); 51. Tadjikian; 52. Pushto;
    (Paleo-Asiatic): 53. Iteljmen (Kamchadal); 54. Chuckchian; 55. Jukagir;
    56. Eskimo:Siberian and American; 57. Arabic; 58. Mangarayi (Aboriginal
    Australian); 57) Korean and many others - 111 all in all. Many of
    these languages are endagered. I'm sure it is high time to establish
    the corpora for the endagered languages. I wonder what the world linguists
    think about this idea. Should the corpora for the endangered languages
    be created? Or should it not? Is it important or should we forget about
    this idea, since it is not important at all? Our main goal, though,
    is to find out the universal characteristics of the sound pictures of
    world languages and to calculate the phonological distances
    on the basis of the frequency of occurrence of phonemes and phonemic
    groups. Then we plan to publish the word frequency dictionaries of the
    languages mentioned above. As a matter of fact,many of these languages
    are still on the old punch-cards, but we are transfering them on PC diskettes.Many
    of the texts (e.g. Japanese,Persian,Arabic, Hebrew, Korean, etc.) are fed in the form
    of phonological transcription. We could exchange some of the material
    in the electronic form. We'd be also happy to work together on
    some joint project with linguists all over the world.
    Yuri Tambovtsev, Novosibirsk, Russia. E-mail address:
      yutamb@hotmail.com



    This archive was generated by hypermail 2b29 : Fri Jul 04 2003 - 15:07:16 MET DST