Ken Litkowski writes:
> Is anyone aware of free software that will process PDF documents into
> text streams? There is a PDF2HTML (with an XML option) that will create
> page-centric versions, but this does not really distinguish text from
> format. I want to ignore (or be able to treat separately) such things
> as headers, footnotes, tables, figures, and equations. (Note that even
> Google retains the page-centric view.)
Given that PDF is a page-centric format, so you are unlikely to find
something that does what you are looking for: headers, footnotes,
tables, etc. are not going to be flagged from the surrounding content
in any special way.
-- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "You can't fake quality any more than you can fake a good meal." (W.S.B.)
This archive was generated by hypermail 2b29 : Tue Mar 28 2006 - 17:53:50 MET DST