Lou Burnard writes:
> The other tool for this purpose which no-one has (so far) mentioned is
> tidy -- http://tidy.,sourceforge.net
>
> It will take almost any html and turn it into something usable very
> fast; it's also very robust and there is a choice of APIs for
> integrating it into your own production system
Just a warning to folks: while Tidy is good, it can get very confused
on bogus HTML, and will crash horribly in ways that are non-trivial to
debug. I've found that pages which have bogus JavaScript embedded can
cause lots of problems, as well as pages in stranger character
encodings.
-tree
-- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "You can't fake quality any more than you can fake a good meal." (W.S.B.)
This archive was generated by hypermail 2b29 : Tue Aug 09 2005 - 22:40:57 MET DST