[Corpora-List] efficient decision tree tool?

From: Caren Brinckmann (cabr@coli.uni-sb.de)
Date: Thu Jan 19 2006 - 02:12:29 MET

  • Next message: Marie-Paule PERY-WOODLEY: "[Corpora-List] 2nd CFP: DISCOURSE AND DOCUMENT"

    Dear all,

    we are currently working on corpus-based models of duration, F0,
    intensity, and segmental reductions in read and spontaneous speech. For
    the first part of our study we will use decision trees.

    Since our database is fairly large, I am looking for an efficient decision
    tree tool with the following features:

    * nominal and numeric input features and predictees (classification and
    regression trees)
    * binary as well as multi-way splits
    * efficient handling of large datasets (200,000 cases/records/instances
    with up to 100 attributes/features/variables)
    * nice to have: integrated feature selection algorithm

    In previous studies, I've worked with "wagon" from the Edinburgh Speech
    Tools Library (http://www.cstr.ed.ac.uk/projects/speech_tools/) and "J48"
    from Weka (http://www.cs.waikato.ac.nz/ml/weka/). While wagon is very fast
    and memory-efficient, it only allows binary splits (as far as I know).
    Weka allows multi-way splits, but is too slow and memory-consuming for our
    current datasets.

    I'm looking forward to your suggestions!

    Kind regards,

    Caren.

    P.S.: If you know any other mailing list or forum where I could post my
    question, please let me know.

    --
    Caren Brinckmann
    Saarland University, FR 4.7 Institute of Phonetics
    P.O.Box 151150, 66041 Saarbruecken, Germany
    Phone: +49-681-3024244, Fax: +49-681-3024684
    



    This archive was generated by hypermail 2b29 : Thu Jan 19 2006 - 03:34:16 MET