Corpora: Re: HTML Concordancing

From: Andrew Kehoe (andrew@rdues.liv.ac.uk)
Date: Tue May 09 2000 - 17:28:29 MET DST

Next message: Djoerd Hiemstra: "Corpora: Cooperation needed to delevelop Dutch IR test collection"

Previous message: Andrew Kehoe: "Corpora: Re: HTML Concordancing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Maritza,

It seems that most of the technology you require is already implicit in our
prototype WebCorp web concordancing software. We have modified the existing tool
to produce word (frequency) lists for web pages. A demonstrator can be found
at http://webcorp.connect.org.uk/wordlist.html, which will construct word lists
for an individual target page.

Regards,

Mike Pacey,
R&D Unit for English Studies,
University of Liverpool

> From owner-corpora@lists.uib.no Tue May 9 12:10 BST 2000
> From: "Maritza vd Heuvel" <MVDH@AKAD.SUN.AC.ZA>
> To: Corpora@hd.uib.no
> Date: Tue, 9 May 2000 12:46:32 +0200
> MIME-Version: 1.0
> Content-transfer-encoding: 7BIT
> Subject: Corpora: Html Concordancing?
>
> Hi
>
> Let me start off by introducing myself. I'm a postgrad researcher
> working on a lexicon for the speech recogntion component of a
> spoken dialogue system. The electronic material available for use
> in corpora and for concordancing purposes is very limited and one
> of our options is using web sites containing relevant information to
> generate word lists. Does anyone know of a concordancing tool
> that allows concordancing of files that contain html tags without
> first requiring conversion of the html into a text format?
>
> Thanks!
> Maritza van den Heuvel

Next message: Djoerd Hiemstra: "Corpora: Cooperation needed to delevelop Dutch IR test collection"
Previous message: Andrew Kehoe: "Corpora: Re: HTML Concordancing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue May 09 2000 - 17:27:35 MET DST