Recently, I've tried to evaluate both commercial and free software for
pdf to text conversion, and I've come to the depressing conclusion that
there is really nothing better to find than Adobe Reader (6.0) Save as
Text... function.
But I don't think that I am familiar with the converter by Ted Briscoe's
group, mentioned by Hamish Cunningham in a reply to your post.
My experience is that you cannot find a tool that can handle 1. the
separation of figure and table captions from the running text 2. unusual
characters and symbols (greek, math) 3. the different ways of coding pdf.
Best,
Kristofer Franzén
Ken Litkowski wrote:
> Is anyone aware of free software that will process PDF documents into
> text streams? There is a PDF2HTML (with an XML option) that will
> create page-centric versions, but this does not really distinguish
> text from format. I want to ignore (or be able to treat separately)
> such things as headers, footnotes, tables, figures, and equations.
> (Note that even Google retains the page-centric view.)
>
> Thanks,
> Ken
This archive was generated by hypermail 2b29 : Tue Mar 28 2006 - 19:57:04 MET DST