Re: [Corpora-List] Comments on PDF Conversion

From: Tom Emerson (tree@basistech.com)
Date: Tue Mar 28 2006 - 23:25:23 MET DST

  • Next message: Mike Maxwell: "Re: [Corpora-List] PDF Conversion"

    Ken Litkowski writes:
    > I only have Acrobat reader, so I can't create in it. But, it seems to
    > me that it should be like any other word processor where you can insert
    > things like footnotes, headers, figures, tables, etc. With at least
    > WordPerfect (with its reveal codes), you can see that codes are used to
    > mark things up. Musn't Adobe have something similar in Acrobat?

    When you add information to a page in PDF you are adding information
    at a given coordinate position on the page. While PDF has "structured
    extensions" that creating apps can use, you cannot and should not rely
    on these. Indeed, it is not uncommon to come across a PDF that doesn't
    have any text at all in it: the pages are bitmaps. This is especially
    true of PDF files that were created by scanning a document. Sometimes
    OCR is performed to associate text with a the image, but this cannot
    be relied on.

    One vendor I haven't seen mentioned is PDFlib GmbH, though this is a
    commercial solution so it may not be useful to you:

    http://www.pdflib.com/index.htm

    They choke on documents containing esoteric (e.g., Indic or other
    complex scripts) content, but generally are pretty good, particularly
    for English.

    -- 
    Tom Emerson                                          Basis Technology Corp.
    Software Architect                                 http://www.basistech.com
     "You can't fake quality any more than you can fake a good meal." (W.S.B.)
    



    This archive was generated by hypermail 2b29 : Tue Mar 28 2006 - 23:24:42 MET DST