Re: [Corpora-List] PDF Conversion

From: Alexander Osherenko (osherenko@gmx.de)
Date: Tue Mar 28 2006 - 17:57:26 MET DST

  • Next message: Constantin Orasan: "Re: [Corpora-List] PDF Conversion"

    Hi Ken,

    I worked with the PDF2HTML tool and my experience is that although it is
    a free software you still pay by losing your time and temper :) - the
    tool produces vague and not very exact results (wrong order of HTML tags
    or footnotes, wrong HTML tags e.g. <b><i><\b><\i> to name one).
    Nevertheless after you finished your first experiments with the tool you
    may find that you are a really mighty expert in PDF, HTML, PDF2HTML
    whatsoever and the tool is actually not so bad...

    Sorry if my answer is something confusing but I hope it helps.

    Cheers

    Alexander

    Ken Litkowski schrieb:

    > Is anyone aware of free software that will process PDF documents into
    > text streams? There is a PDF2HTML (with an XML option) that will
    > create page-centric versions, but this does not really distinguish
    > text from format. I want to ignore (or be able to treat separately)
    > such things as headers, footnotes, tables, figures, and equations.
    > (Note that even Google retains the page-centric view.)
    >
    > Thanks,
    > Ken



    This archive was generated by hypermail 2b29 : Tue Mar 28 2006 - 18:36:53 MET DST