diff --git a/TODO b/TODO index 57775ce..2dc8b81 100644 --- a/TODO +++ b/TODO @@ -3,5 +3,5 @@ TODOs: - Better text extraction / layout analysis. - Better API Documentation. - Robust error handling. - - Any special handling for linearized PDFs? - - Handle crypt filter. (More sample documents are needed!) + - Crypt stream filter support. (More sample documents are needed!) + - CCITTFax stream filter support. diff --git a/docs/index.html b/docs/index.html index 9057053..6c0abd7 100644 --- a/docs/index.html +++ b/docs/index.html @@ -19,7 +19,7 @@ Python PDF parser and analyzer
-Last Modified: Mon Jan 4 23:23:00 JST 2010 +Last Modified: Sat Jan 30 16:32:50 JST 2010
@@ -204,6 +204,10 @@ HTML-like tags. pdf2txt tries to extract its content streams rather than inferri Tags used here are defined in the PDF specification (See §10.7 "Tagged PDF").

+

-I image_directory +
Specifies the output directory for image extraction. +Currently only JPEG images are supported. +

-D direction
-M char_margin
-L line_margin @@ -334,6 +338,8 @@ no stream header is displayed for the ease of saving it to a file. PEP-257 conformance.
  • Better text extraction / layout analysis.
  • Better API Documentation. +
  • Crypt stream filter support. (More sample documents are needed!) +
  • CCITTFax stream filter support.
  • Robust error handling. @@ -341,6 +347,7 @@ no stream header is displayed for the ease of saving it to a file.

    Changes