[Corpora-List] CFP: Computationally Hard Problems in Speech and Language Processing

From: Hal Daume III (hdaume@ISI.EDU)
Date: Mon Dec 12 2005 - 22:37:44 MET

Next message: Jun Lang: "[Corpora-List] Chiniese Name Gender Recognition"

Previous message: Briony Williams: "Re: [Corpora-List] Query about corpora of spoken English"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

-----------------------------------------------------------------
COMPUTATIONALLY HARD PROBLEMS IN SPEECH AND LANGUAGE PROCESSING:
EXPRESSIVENESS VS. TRACTABILITY IN LEARNING AND INFERENCE
-----------------------------------------------------------------

WEBSITE: http://www.cis.upenn.edu/~ryantm/naaclWS06/

---------------------
Call for Papers
---------------------

Modern machine learning methods (such as maximum entropy models,
decision trees and support vector machines) have provided the language
processing community with robust tools for supervised learning
problems with simple outputs. These methods provide natural
mechanisms for representing linguistic knowledge through weighted
features and obtain high accuracy by maximizing performance on
training data.

Unfortunately, when one wishes to apply similarly robust techniques to
problems with complex outputs, one is limited to models with nice
computational properties, such as those that can be formulated in
terms of sequence or tree representations. Even for these relatively
simple structures, polynomial time algorithms exist only under strong
restrictions on the locality of features (the Markov assumption for
sequences and the context free assumption for trees). Moreover, even
under such assumptions, the complexity of these algorithms can grow
unwieldy even while remaining polynomial, for instance in the case of
synchronous or lexicalized context free grammars.

Recent work on ranking, sampling and other approximate solutions to
such problems have indicated that researchers are coming back to the
hard problems in speech and text, for which efficient algorithms do
not (or are not known to) exist. Some might even argue that language
processing has more to gain from richer and more global feature
representations than it does from new machine learning solutions.

The purpose of this workshop will be to explore new research aimed at
identifying and solving speech and language processing problems that do
not have practical computational properties. We also wish to explore
the adaptation of current state-of-the-art inference and learning
algorithms to problems beyond sequence and tree analysis. In
particular the workshop will have the following themes:

PROBLEM IDENTIFICATION AND SPECIFICATION

- Formal investigations on the computational properties of new and old
problems in speech, parsing, translation, summarization, information
extraction and other common language processing areas.

- Moving beyond the Markov and context-free assumptions of our
   underlying representations. Identifying the linguistic
   (im)plausibility of these assumptions. Global feature functions
   that incorporate much richer information about sentence and document
   level properties as well as long distance dependencies.

- Joint representations such as those incorporating both syntax
(phrase-structure, dependencies, etc.) and semantics (entities,
relations, predicate-arguments).

- Efficient solutions to problems that have historically been viewed
as intractable.

- Appropriate (natural) representations of problems, and how this
effects complexity/performance. Structure of search.

- Identifying when old solutions (e.g. classification, sequential
labeling) to new problems are appropriate and when more flexible
solutions are required.

LEARNING AND INFERENCE

- Approximate inference algorithms. The pros and cons of pruning
   techniques. New approximation algorithms for hard problems
   including those based on beam search, sampling or other motivated
   methods. Surveys of known approximation and pruning methods
   identifying specific properties of success and failure. Efficiency
   vs. performance trade-offs. Formal characterizations of search
   errors, and their relation to complexity.

- Approximate parameter estimation. Training techniques for models in
   which parameter estimation is intractable for both generative and
   discriminative frameworks. The effects of approximate parameter
   estimation on performance. Learning parameters with respect to
   approximation in search. Batch vs. online learning techniques when
   combined with approximate inference.

- Joint structures and complex systems. When are pipelined methods
   sufficient? When do we need joint learning and inference to achieve
   maximum benefit? Are the benefits of joint inference and learning
   for factorizable structures worth the computational cost?

- The trade off between loss functions and tractability. When is it
   necessary to use structure, or can components be predicted
   separately? Theoretical or practical comparisons between
   computational complexity and performance for different loss
   functions.

- Differences between complex one-pass systems versus
stacked/multi-pass "decoding."

We encourage the submission of papers that address any of the above
themes.

This workshop is interested in completed work as well as works in
progress.

Submissions should be a maximum of 8 pages and should use the
HLT-NAACL formatting guidelines that can be obtained here.
Furthermore, the reviewing process will be blind so names and
affiliations need to be removed from the title page as well as
all self-references that reveal the author's identity.

Submissions should be sent to ryantm@cis.upenn.edu and should
have "HLT-NAACL Workshop Submission" in the title.

Dates

- March 3, 2006: Paper submissions due by 11:59pm EST
- April 7, 2006: Paper acceptance notification
- April 21, 2006: Camera ready versions due
- June 9, 2006: Workshop

Workshop Website: http://www.cis.upenn.edu/~ryantm/naaclWS06/

--------------------------------
Organizers and PC
--------------------------------

Organizers

     * Ryan McDonald (UPenn) - contact ryantm@cis.upenn.edu
     * Hal Daumé III (ISI)
     * Fernando Pereira (UPenn)

Program Committee

     * Jeff Bilmes (Washington)
     * Michael Collins (MIT)
     * Jason Eisner (Johns Hopkins)
     * Radu Florian (IBM)
     * Liang Huang (UPenn)
     * Thorsten Joachims (Cornell)
     * Chris Quirk (Microsoft)
     * Dan Roth (UIUC)
     * Noah Smith (Johns Hopkins)
     * Charles Sutton (UMass)
     * Ben Taskar (Berkeley)

Next message: Jun Lang: "[Corpora-List] Chiniese Name Gender Recognition"
Previous message: Briony Williams: "Re: [Corpora-List] Query about corpora of spoken English"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Dec 12 2005 - 22:45:15 MET