Re: [Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian

From: Joakim Nivre (nivre@msi.vxu.se)
Date: Thu Feb 15 2007 - 14:31:40 MET

  • Next message: Djoerd Hiemstra: "[Corpora-List] SIGIR 2007: 2nd Call for Posters and Demonstrations"

    Hi Hrafn,

    You can find some statistics about Swedish in our article:

    Nivre, J. and Grönqvist, L. (2001) Tagging a Corpus of Spoken Swedish.
    International Journal of Corpus Linguistics 6(1), 47-78.

    A pre-print is available from my home page at:
    http://w3.msi.vxu..se/~nivre/research/publ.html

    The percentage of ambiguous tokens we get for the Stockholm-Umeå corpus is
    45.37. However, this is measured with the base tag set, consisting of only
    23 tags. With the full tag set, containing some 150 tags, the percentage
    will be higher. This is one of the reasons why it is very difficult to
    compare these figures across languages and corpora. You will find more
    details in the paper. (The first place to look is table 1.)

    Best,
    Joakim

    On Thu, 15 Feb 2007, Hrafn Loftsson wrote:

    > Hi everyone,
    >
    >
    >
    > (It has been pointed out to me that, for some reason, my message to the
    > list appeared empty in some e-mail systems. Here is a second try:)
    >
    >
    >
    > The paper: "J. Hajic (2000) Morphological tagging: Data vs.
    > Dictionaries", reports percentages of ambiguous tokens for English
    > (38.65%), Czech (45.97%), Estonian (40.24%), Hungarian (21.58%),
    > Romanian (40.00%) and Slovene (38.01%), using an annotated version of
    > Orwell's 1984 novel for each of these languages.
    >
    >
    >
    > I need corresponding percentage number for Swedish, Danish and
    > Norwegian, calculated using ANY corpora.
    >
    >
    >
    > Does anyone have this info (and preferably a reference to a paper which
    > discusses the issue)?
    >
    >
    >
    > Regards,
    >
    > Hrafn Loftsson
    >
    > Assistant professor
    >
    > Department of Computer Science
    >
    > School of Science and Engineering
    >
    > Reykjavik University
    >
    > Iceland
    >
    >

    ==================================================================
    Joakim Nivre

    Växjö University Uppsala University
    School of Mathematics Department of Linguistics
    and Systems Engineering and Philology
    SE-35195 Växjö Box 635, SE-75126 Uppsala

    Tel: +46 470 708992 Tel: +46 18 4717009
    Fax: +46 470 84004 Fax: +46 18 4711094
    E-mail: nivre@msi.vxu.se E-mail: joakim.nivre@lingfil.uu.se

    URL: http://www.msi.vxu.se/users/nivre
    ==================================================================



    This archive was generated by hypermail 2b29 : Thu Feb 15 2007 - 14:37:53 MET