Re: [Corpora-List] PDF Conversion

From: radev@umich.edu
Date: Tue Mar 28 2006 - 21:15:35 MET DST

  • Next message: Ken Litkowski: "[Corpora-List] Comments on PDF Conversion"

    My student Alex C de Baca recommended this software:

    http://www.foolabs.com/xpdf/index.html
    http://www.bluem.net/downloads/pdftotext_en/

    Ken Litkowski wrote:
    >
    > Is anyone aware of free software that will process PDF documents into
    > text streams? There is a PDF2HTML (with an XML option) that will create
    > page-centric versions, but this does not really distinguish text from
    > format. I want to ignore (or be able to treat separately) such things
    > as headers, footnotes, tables, figures, and equations. (Note that even
    > Google retains the page-centric view.)
    >
    > Thanks,
    > Ken
    > --
    > Ken Litkowski TEL.: 301-482-0237
    > CL Research EMAIL: ken@clres.com
    > 9208 Gue Road
    > Damascus, MD 20872-1025 USA Home Page: http://www.clres.com
    >
    >
    >
    >
    >

    -- 
    Dragomir R. Radev                                         radev@umich.edu
    Associate Professor of Information, Electrical Engineering and
    Computer Science, and Linguistics, the University of Michigan, Ann Arbor
    Phone: 734-615-5225   Fax: 734-764-2475    http://www.si.umich.edu/~radev
    



    This archive was generated by hypermail 2b29 : Tue Mar 28 2006 - 21:14:38 MET DST