Re: [Corpora-List] BNC raw text

From: Grzegorz Chrupa�a (pitekus@gmail.com)
Date: Thu Sep 22 2005 - 11:25:03 MET DST

Next message: santinim\@inwind\.it: "Re:[Corpora-List] The genre of the Web"

Previous message: Robert Rittman: "[Corpora-List] BNC raw text"
In reply to: Robert Rittman: "[Corpora-List] BNC raw text"
Next in thread: Jakob Halskov: "Re: [Corpora-List] BNC raw text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Robert,
Why don't you just extract the plain text from the marked up files? It
should be pretty trivial if you use some SGML library.
Best,

--
Grzegorz Chrupała ♦ pithekos.net
On 22/09/05, Robert Rittman <robert.rittman@gmail.com> wrote:
> I am working with the British National Corpus - World Edition CD-ROM. The CD
> does not contain the raw text of the 4,000+ documents. It only contains
> tagged text in SGML format (including metadata). Does anyone know where I
> can obtain the raw (untagged) text in plain text format?
>  Thank you,
>  Robert Rittman
> PhD Candidate
> School of Communication, Information and Library Studies
> Rutgers University
>
>
>

Next message: santinim\@inwind\.it: "Re:[Corpora-List] The genre of the Web"
Previous message: Robert Rittman: "[Corpora-List] BNC raw text"
In reply to: Robert Rittman: "[Corpora-List] BNC raw text"
Next in thread: Jakob Halskov: "Re: [Corpora-List] BNC raw text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Sep 22 2005 - 11:36:58 MET DST