Hi:
This Perl script should do the job:
print "What is your input file name:\n";
chomp($infile=<STDIN>);
open IN, $infile or die "No file, no fun!";
open OUT, ">$infile.out" or die "No file, no fun!";
while (<IN>) {
$_=~s/\<.+?\>//g;
print OUT "$_";
}
close (IN) or die "D'oh!";
close (OUT) or die "D'oh!";
Best,
Danko Sipka
sipkadan@main.amu.edu.pl | Danko.Sipka@asu.edu
http://main.amu.edu.pl/~sipkadan | http://www.public.asu.edu/~dsipka
----- Original Message -----
From: Tine & Colleen
To: CORPORA@HD.UIB.NO
Sent: Tuesday, April 16, 2002 8:13 PM
Subject: Corpora: sgml detagger
Hi
I am compiling a corpus for research reasons and some of the texts are sgml-tagged.
Does anybody know an easy way to remove the tags and save the texts as 'raw' .txt files?
Maybe a PERL script?
Thanks in advance
Tine Lassen
Copenhagen
This archive was generated by hypermail 2b29 : Tue Apr 16 2002 - 20:24:03 MET DST