Re: [Corpora-List] BNC raw text

From: Grzegorz Chrupała (pitekus@gmail.com)
Date: Thu Sep 22 2005 - 11:25:03 MET DST

  • Next message: santinim\@inwind\.it: "Re:[Corpora-List] The genre of the Web"

    Hi Robert,
    Why don't you just extract the plain text from the marked up files? It
    should be pretty trivial if you use some SGML library.
    Best,

    --
    Grzegorz Chrupała ♦ pithekos.net
    

    On 22/09/05, Robert Rittman <robert.rittman@gmail.com> wrote: > I am working with the British National Corpus - World Edition CD-ROM. The CD > does not contain the raw text of the 4,000+ documents. It only contains > tagged text in SGML format (including metadata). Does anyone know where I > can obtain the raw (untagged) text in plain text format? > Thank you, > Robert Rittman > PhD Candidate > School of Communication, Information and Library Studies > Rutgers University > > >



    This archive was generated by hypermail 2b29 : Thu Sep 22 2005 - 11:36:58 MET DST