RE: [Corpora-List] Chiniese Name Gender Recognition

From: Mark Lewellen (lewellen@erols.com)
Date: Wed Dec 21 2005 - 19:39:14 MET

  • Next message: Rob Freeman: "Re: [Corpora-List] QM analogy and grammatical incompleteness"

    Good point, Yorick. However, the results of
    Jun's method could also reflect a situation in
    which the majority of name occurrences are possible
    to analyze in this way, while a minority are not.
    (I believe this to be the case.)
    There are many common Chinese given names that are
    reliably male/female, as well as rare or novel names
    that could be guessed at. Another zipfian distribution!
    In this case, with a very long tail, since anything
    can be used. (There are wild examples in the literature,
    such as names that reflected political slogans during the
    Cultural Revolution, or disgusting names meant to ward
    off evil demons.)
    I think that sometimes, given the great success of
    statistical methods, we expect them to work magic in
    every instance...however, I think this is a case of a
    problem domain that can only yield a partial solution.
    A similarly problematic domain is in identifying the
    Chinese characters of a name, given only the romanization
    (when multiple romanizations of multiple Chinese
    languages/dialects are considered).
    In such problem domains, it would be useful to
    present confidence measures (in human terms: "I'm sure
    this is a female name", or "Could possibly be male--no
    data to back this up--but it could be associated with
    'male' traits.)

    Mark

    > -----Original Message-----
    > From: Yorick Wilks [mailto:yorick@dcs.shef.ac.uk]
    > Sent: Wednesday, December 21, 2005 12:39 PM
    > To: lewellen@erols.com
    > Cc: 'Jun Lang'; 'Xiaofei Lu'; corpora@uib.no
    > Subject: Re: [Corpora-List] Chiniese Name Gender Recognition
    >
    >
    > If Jun's method gets 70% name-gender right, that alone
    > suggests there
    > is some real gender bias in the symbols that
    > > statistical method, algorithm, or even native speaker
    > could indeed model, and does!
    > Yorick Wilks
    >
    >
    >
    > On 21 Dec 2005, at 15:49, Mark Lewellen wrote:
    >
    > > Since Chinese given names are not limited to a set of
    > > lexical items that are prototypically 'names' (i.e. they
    > > can be just about any lexical item), Chinese given names,
    > > as you probably know, often have no clue about gender.
    > > There has been some discussion on 'traits' that are
    > > more feminine or masculine and would be reflected in names,
    > > but there remains a lot of ambiguity. I doubt there is any
    > > statistical method, algorithm, or even native speaker that
    > > can make up for that problem!
    > >
    > > Mark Lewellen
    > >
    > >
    > >> -----Original Message-----
    > >> From: owner-corpora@lists.uib.no
    > >> [mailto:owner-corpora@lists.uib.no] On Behalf Of Jun Lang
    > >> Sent: Tuesday, December 13, 2005 7:31 AM
    > >> To: 'Xiaofei Lu'
    > >> Cc: corpora@uib.no
    > >> Subject: [Corpora-List] 答复: [Corpora-List] Chiniese Name
    > >> Gender Recognition
    > >>
    > >>
    > >> Yeah! There are many names which could be used for mail and
    > >> female. It is a
    > >> difficult problem. Now I have done some simple research on this
    > >> topic.
    > >> Recently, I am trying to get more and more data. Since the
    > >> parameter space
    > >> is very huge, decision trees can not get the final result
    > >> quickly. I want to
    > >> use Bayes Model again.
    > >>
    > >> Can you give me some ideas about it? Thanks a lot!
    > >>
    > >> Best wishes,
    > >> Jun Lang
    > >>
    > >> -----邮件原件-----
    > >> 发件人: Xiaofei Lu [mailto:xflu@ling.ohio-state.edu]
    > >> 发送时间: 2005年12月13日 13:56
    > >> 收件人: Jun Lang
    > >> 主题: Re: [Corpora-List] Chiniese Name Gender Recognition
    > >>
    > >> Interesting. What is and how do you establish the baseline?
    > >> Many names can
    > >> be either male or female, can't they?
    > >>
    > >> On Tue, 13 Dec 2005, Jun Lang wrote:
    > >>
    > >>
    > >>> Hi all Corpora Members,
    > >>>
    > >>> Now I am studying on Chinese Name Gender Recognition.
    > >>>
    > >> The input is a
    > >>
    > >>> Chinese name. The output is the corresponding gender. I
    > >>>
    > >> used decision
    > >> trees
    > >>
    > >>> method. But finally, the accuracy is only about 70%.
    > >>>
    > >>> Do you know any other method which can achieve higher
    > >>>
    > >> accuracy? And is
    > >>
    > >>> there somebody has done any similar research?
    > >>>
    > >>> Thanks a lot!
    > >>>
    > >>>
    > >>>
    > >>> Best wishes,
    > >>>
    > >>> Bill_Lang(Jun Lang): Ph.D Candidate
    > >>>
    > >>> Information Retrieval Laboratory
    > >>>
    > >>> Harbin Institute of Technology
    > >>>
    > >>> Mail: bill_lang@gmail.com
    > >>>
    > >>> Homepage: http://ir.hit.edu.cn/~bill_lang
    > >>>
    > >>>
    > >>>
    > >>
    > >>
    > >
    > >
    > >
    > >
    >



    This archive was generated by hypermail 2b29 : Wed Dec 21 2005 - 19:44:37 MET