Hi Robert,
Why don't you just extract the plain text from the marked up files? It
should be pretty trivial if you use some SGML library.
Best,
-- Grzegorz Chrupała ♦ pithekos.netOn 22/09/05, Robert Rittman <robert.rittman@gmail.com> wrote: > I am working with the British National Corpus - World Edition CD-ROM. The CD > does not contain the raw text of the 4,000+ documents. It only contains > tagged text in SGML format (including metadata). Does anyone know where I > can obtain the raw (untagged) text in plain text format? > Thank you, > Robert Rittman > PhD Candidate > School of Communication, Information and Library Studies > Rutgers University > > >
This archive was generated by hypermail 2b29 : Thu Sep 22 2005 - 11:36:58 MET DST