Re: [Corpora-List] Re: [Corpora-list] Incidence of MWEs

From: Kit Chun Yu (ctckit@cityu.edu.hk)
Date: Sat Mar 18 2006 - 02:39:15 MET

Next message: Will Fitzgerald: "Re: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"

Previous message: Santos Diana: "RE: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
In reply to: Lou Burnard: "Re: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
Next in thread: Will Fitzgerald: "Re: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
Next in thread: Santos Diana: "RE: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
Reply: Will Fitzgerald: "Re: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

why not think about this kind of issues form the perspective of
tokenization for NLP?
(a very old paper: Webster & Kit, "Tokenization as the initial phase in
NLP", COLING-92 1106-1110.)
a very simple idea: anything that are not to be further decomposed into
any smaller fragments are simply treated as a token.
what is a token (or atomic text unit, which may have its own internal
structure) seems to be application-dependent.
we may have mono-word and multi-word tokens, incl. continuous and
discontinuous (or noncontiguous) ones (or MWEs).
accordingly, we can have something like this for tagging: <t ..> <w
..>... </w> <w..>... </w> ... </t>
we may need some more sophisticated tagging for discontinuous ones, of
course.
just to put in my two cents.
best,

Chunyu Kit, PhD
Assistant Professor in Computational Linguistics

Dept. of Chinese, Translation & Linguistics
City University of Hong Kong
83 Tat Chee Ave., Kowloon

E-mail:ctckit@cityu.edu.hk
http://personal.cityu.edu.hk/~ctckit/
Fax: (+852)2788 8706, 2788 8732
Tel: (+852)2788 9310 (O), 9380 1738 (M)
(+86)136 5881 2972 (China Mobile)

Next message: Will Fitzgerald: "Re: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
Previous message: Santos Diana: "RE: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
In reply to: Lou Burnard: "Re: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
Next in thread: Will Fitzgerald: "Re: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
Next in thread: Santos Diana: "RE: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
Reply: Will Fitzgerald: "Re: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Mar 18 2006 - 03:39:14 MET