Re: Corpora: sgml detagger

From: Danko Sipka (sipkadan@main.amu.edu.pl)
Date: Tue Apr 16 2002 - 20:31:35 MET DST

  • Next message: Alexander S. Yeh: "Re: Corpora: sgml detagger"

    Hi:
    This Perl script should do the job:

    print "What is your input file name:\n";
    chomp($infile=<STDIN>);
    open IN, $infile or die "No file, no fun!";
    open OUT, ">$infile.out" or die "No file, no fun!";
    while (<IN>) {
        $_=~s/\<.+?\>//g;
        print OUT "$_";
        }
    close (IN) or die "D'oh!";
    close (OUT) or die "D'oh!";

    Best,

    Danko Sipka
    sipkadan@main.amu.edu.pl | Danko.Sipka@asu.edu
    http://main.amu.edu.pl/~sipkadan | http://www.public.asu.edu/~dsipka

      ----- Original Message -----
      From: Tine & Colleen
      To: CORPORA@HD.UIB.NO
      Sent: Tuesday, April 16, 2002 8:13 PM
      Subject: Corpora: sgml detagger

      Hi
      I am compiling a corpus for research reasons and some of the texts are sgml-tagged.
      Does anybody know an easy way to remove the tags and save the texts as 'raw' .txt files?
      Maybe a PERL script?

      Thanks in advance

      Tine Lassen
      Copenhagen



    This archive was generated by hypermail 2b29 : Tue Apr 16 2002 - 20:24:03 MET DST