[Corpora-List] Density of Language Taxa

From: Yuri Tambovtsev (yutamb@mail.cis.ru)
Date: Tue May 09 2006 - 14:32:34 MET DST

  • Next message: ben dbabis samira: "[Corpora-List] Q&A system for a given collection of document?"

    Dear Corpora colleagues, please comment on the following:
    Yuri Tambovtsev, Novosibirsk Pedagog. University, Russia.
    yutamb@mail.ru
    Dispersion of the Uralic language taxon from a typological viewpoint.
       The goal of this research was to compute the similarity of the
    distribution of 8 consonantal groups (labial, front, palatal, back,
    sonorant, occlusive, fricative and voiced) in the speech sound chains
    of different world languages. The value of the coefficient of variance
    was chosen as the measure of similarity. Let us analyse the values in
    some language taxa: groups, families and super-families.
    The Value of the Mean of the Coefficient of Variance (V%).
    Ugric group (5 languages) - V%= 27.66%
    Volgaic group (4) - V% = 17.90%
    Baltic-Finnic group (7) = 23.24%
    Finno-Ugric family (20) = 23.91%
    Samoyedic family (4) = 16.30%
    Uralic super-family (24) = 28.31%.
    The value of the mean of the coefficient of variance of the Ugric
    group (27.66%) is really great. We can compare it to the analogical
    means of the groups of the Indo-European family: Baltic (2 languages)
    - 9.08%; Iranian (8 languages) - 11.69%; Slavonic (12 languages) -
    15.78%; Indic -20.40%; Germanic (6 languages) - 24.51%.
    It is possibele to explain the great value of dispersion of
    the Ugric group by the fact that the structure of the Hungarian
    speech sound chain is too different from those of Mansi and Hanty.
    The fact that the value of the mean of the coefficient of variance in the
    Samoyedic language taxon may tell us that the languages of the
    Samoyedic origin are more typologically similar, than those of Indic
    or Germanic origin. If we unite the Finno-Ugric languages (23.91%)
    and the Samoyedic languages (16.30%) into one language taxon,
    called Uralic, then the dispersion increases to 28.31%, which is much
    greater than those of the Finno-Ugric and Samoyedic families taken
    separately. It means that typologically these two parts are quite
    different. This is why, one should be cautious to unite them. They
    seem quite different from the point of view of the distribution of the
    consonants in their speech chains. Usually, genetically related
    languages have similar speech sound chains, that is, they are
    typologically close. Basing on the typological data, it is possible to
    suppose that Finno-Ugric and Samoyedic languages have gone into
    different directions and this distance is rather great.
    I'd like to hear comments of colleagues concerning the distances
    between the languages inside the language groups, families and super-
    families based on the typological data. I wish I could co-operate with
    the linguists who may be interested in my method. It is possible to
    study the density and dispersion of the language taxa of American
    Indian language taxa or the taxa of the Aboriginal languages of
    Australia, etc., etc. Looking forward to hearing from you soon to
    yutamb@mail.ru Remain yours sincerely Yuri Tambovtsev, Novosibirsk
    Pedagog. University, Novosibirsk, Russia.



    This archive was generated by hypermail 2b29 : Tue May 09 2006 - 14:17:26 MET DST