RE: [Corpora-List] Corpora for EAP: Architecture...?

From: Eric Atwell (eric@comp.leeds.ac.uk)
Date: Mon Jan 16 2006 - 11:26:07 MET

  • Next message: Marco Baroni: "Re: [Corpora-List] Corpora for EAP: Architecture...?"

    Bootcat and WACKY (Web-as-Corpus Kool Ynitiative) tools are perl scripts
      - does anyone know of equivalents in Python? e.g. is anyone developing
    web-as-corpus extras for the python Natural Language Tool Kit?

    I want to set a Web-as-Corpus data-mining/analysis coursework exercise for
    my "Technologies for Knowledge Management" module next semester;
    these Computing undergrads are familiar with Python and Java, but not Perl.

    Alternatively, when will the public web-based version of BootCat be
    available?... (and will it cope with 70 computing students testing it?!)

    Eric Atwell, School of Computing, Leeds University

    On Mon, 16 Jan 2006, Adam Kilgarriff wrote:

    > Dear Nigel,
    >
    >
    >
    > Do you know BootCat tools? They allow you to prepare special-language
    > corpora from web pages automatically. See
    > http://sslmit.unibo.it/~baroni/bootcat.html
    >
    >
    >
    > We are currently preparing a web-service version of the tool, so then you
    > can enter ‘seed’ terms and then produce a corpus in that area by clicking
    > the “go” button. Public version to follow before long. In the meantime, if
    > you give me half a dozen relevant architecture terms (single words or multi
    > words, and selected to avoid picking up non-architecture hits) I’ll make a
    > small sample corpus and point you to it,
    >
    >
    >
    > Adam Kilgarriff
    >
    > ...
    >
    > SERGE SHAROFF
    > in my view the best option is to collect the corpus you want
    > automatically using BootCat tools:
    > http://wacky.sslmit.unibo.it/
    >

    -- 
    Eric Atwell, Senior Lecturer, Language research group, School of Computing,
    Faculty of Engineering, University of Leeds, LEEDS LS2 9JT, England
    TEL: +44-113-2335430  FAX: +44-113-2335468  http://www.comp.leeds.ac.uk/eric
    



    This archive was generated by hypermail 2b29 : Mon Jan 16 2006 - 11:50:12 MET