diff --git a/README.md b/README.md index 8393488..f66c7d0 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis. + Features -------- @@ -23,6 +24,7 @@ Features * Tagged contents extraction. * Automatic layout analysis. + How to Install -------------- @@ -37,6 +39,7 @@ How to Install $ pdf2txt.py samples/simple1.pdf + For CJK Languages ----------------- @@ -60,6 +63,7 @@ paste the following commands on a command line prompt: python tools\conv_cmap.py -c KSC-EUC=euc-kr -c KSC-Johab=johab -c KSCms-UHC=cp949 -c UniKS-UTF8=utf-8 pdfminer\cmap Adobe-Korea1 cmaprsrc\cid2code_Adobe_Korea1.txt python setup.py install + Command Line Tools ------------------ @@ -87,6 +91,21 @@ but it's also possible to extract some meaningful contents (e.g. images). (For details, refer to the html document.) + +API Changes +----------- + +As of November 2013, there were a few changes made to the PDFMiner API +prior to October 2013. This is the result of code restructuring. Here +is a list of the changes: + + * PDFDocument class is moved to pdfdocument.py. + * PDFDocument class now takes a PDFParser object as an argument. + PDFDocument.set_parser() and PDFParser.set_document() is removed. + * PDFPage class is moved to pdfpage.py + * process_pdf function is implemented as a class method PDFPage.get_pages. + + TODO ---- @@ -97,6 +116,7 @@ TODO * Better documentation. * Crypt stream filter support. + Related Projects ---------------- @@ -105,6 +125,7 @@ Related Projects * pdfbox * mupdf + Terms and Conditions -------------------- diff --git a/docs/index.html b/docs/index.html index 0039ffd..6404407 100644 --- a/docs/index.html +++ b/docs/index.html @@ -9,7 +9,7 @@
-Last Modified: Sat Oct 26 15:03:35 UTC 2013 +Last Modified: Sun Nov 17 06:32:44 UTC 2013
@@ -368,7 +368,18 @@ no stream header is displayed for the ease of saving it to a file.

Changes