Corpora: Re: HTML Concordancing

From: Andrew Kehoe (andrew@rdues.liv.ac.uk)
Date: Tue May 09 2000 - 17:28:29 MET DST

  • Next message: Djoerd Hiemstra: "Corpora: Cooperation needed to delevelop Dutch IR test collection"

    Maritza,

    It seems that most of the technology you require is already implicit in our
    prototype WebCorp web concordancing software. We have modified the existing tool
    to produce word (frequency) lists for web pages. A demonstrator can be found
    at http://webcorp.connect.org.uk/wordlist.html, which will construct word lists
    for an individual target page.

    Regards,

    Mike Pacey,
    R&D Unit for English Studies,
    University of Liverpool

    > From owner-corpora@lists.uib.no Tue May 9 12:10 BST 2000
    > From: "Maritza vd Heuvel" <MVDH@AKAD.SUN.AC.ZA>
    > To: Corpora@hd.uib.no
    > Date: Tue, 9 May 2000 12:46:32 +0200
    > MIME-Version: 1.0
    > Content-transfer-encoding: 7BIT
    > Subject: Corpora: Html Concordancing?
    >
    > Hi
    >
    > Let me start off by introducing myself. I'm a postgrad researcher
    > working on a lexicon for the speech recogntion component of a
    > spoken dialogue system. The electronic material available for use
    > in corpora and for concordancing purposes is very limited and one
    > of our options is using web sites containing relevant information to
    > generate word lists. Does anyone know of a concordancing tool
    > that allows concordancing of files that contain html tags without
    > first requiring conversion of the html into a text format?
    >
    > Thanks!
    > Maritza van den Heuvel



    This archive was generated by hypermail 2b29 : Tue May 09 2000 - 17:27:35 MET DST