Re: Corpora: Noun phrases categories

From: Francis Bond (bond@cslab.kecl.ntt.co.jp)
Date: Mon May 20 2002 - 04:32:49 MET DST

  • Next message: Andrew Harley: "Re: Corpora: Noun phrases categories"

    G'day,

    Fuchun> I am working on classifying noun phrases into several
    Fuchun> categories, such as mass NPs and count NPs, and even dividing
    Fuchun> each category further. The goal is to develop better language
    Fuchun> models for noun phrases modeling. and If it works, we can
    Fuchun> develop better language models for sentences and better NP
    Fuchun> chunkers.

    Fuchun> I am wondering are there any previous work done on this topic?
    Fuchun> How many categories should we divide noun phrases into and are
    Fuchun> there such labeled data?

    There is a vast literature on this in linguistics, two of the references I
    found particularly interesting are:

    @Book{AnnaW:1988,
      author = "Anna Wierzbicka",
      title = "The Semantics of Grammar",
      publisher = "John Benjamins",
      address = "Amsterdam",
      year = 1988
    }

    @article{Allan:1980,
      author = "Keith Allan",
      title = "Nouns and Countability",
      journal = "Language",
      year = 1980,
      volume = 56,
      number = 3,
      pages = "541--67"
    }

    From a computational point of view, I have been looking at
    countability from the point of view of Japanese-to-English MT, and
    suggest splitting countability into 5 types (with a couple of
    sub-types): Fully countable; Strongly Countable; Weakly Countable;
    Uncountable and Plural Only.

    I discuss these in several papers and my dissertation:

    @inproceedings{Bond:1994,
      author = "Francis Bond and Kentaro Ogura and Satoru Ikehara",
      title = "Countability and Number in {Japanese}-to-{English}
                      Machine Translation",
      booktitle = coling-94,
      year = "1994",
      address = "Kyoto",
      **month = aug,
      pages = "32--38",
      note = "(\url{http://xxx.lanl.gov/abs/cmp-lg/9511001})",
      **organization ="The International Committee on Computational
                      Linguistics (ICCL)"
    }
    @Article{Bond:1998,
      author = "Francis Bond and Kentaro Ogura",
      title = "Reference in {Japanese}-to-{English} Machine
                      Translation",
      journal = MT,
      volume = 13,
      number = "2--3",
      year = 1998,
      pages = "107-134"
    }
    @PhDThesis{Bond:2001,
      author = "Francis Bond",
      title = "Determiners and Number in {English} contrasted with
                      {Japanese} --- as exemplified in Machine
                      Translation",
      school = "University of Queensland",
      year = 2001,
      address = "Brisbane, Australia"
    }

    Ann Copestake also talks a bit about countability in her dissertation
    and other publications too numerous to mention:

    @PhdThesis{Copestake:1992z,
      author = "Ann Copestake",
      title = "The Representation of Lexical Semantic Information",
      school = "University of Sussex",
      year = 1992,
      address = "Brighton"
    }

    As far as I know there isn't any labeled data generally available, but
    I would be happy to be proved wrong.

    -- 
    Francis Bond  <www.kecl.ntt.co.jp/icl/mtg/members/bond/>
    NTT Communication Science Laboratories | Machine Translation Research Group
    



    This archive was generated by hypermail 2b29 : Mon May 20 2002 - 04:53:46 MET DST