Xpdf is a PDF viewer and a set of command-line utilities for processing PDF files.

To convert a PDF document to plain text, use pdftotext. To preserve the layout of the pages as they would appear in print or on screen, specify the -layout command line option. (This is useful for single-column material). To decolumnize (i.e., present as a single column) documents formatted as two or more columns of text, specify the -raw command line option.

To find out how many of your PDF documents (probably the overwhelming majority) don't use tagged PDF, run the pdfinfo command. Tagged PDF is a feature of the PDF file format which allows the logical structure of a document (headings, paragraphs, lists, tables, etc.) to be preserved in the PDF file. Accessibility is its primary application.

CategoryDocumentConvertors CategoryText

None: xpdf (last edited 2007-06-12 18:23:05 by SamuelThibault)