Documentation updates.
parent
cf1e3c9973
commit
e39e39fa12
21
README.md
21
README.md
|
@ -10,6 +10,7 @@ It includes a PDF converter that can transform PDF files
|
|||
into other text formats (such as HTML). It has an extensible
|
||||
PDF parser that can be used for other purposes than text analysis.
|
||||
|
||||
|
||||
Features
|
||||
--------
|
||||
|
||||
|
@ -23,6 +24,7 @@ Features
|
|||
* Tagged contents extraction.
|
||||
* Automatic layout analysis.
|
||||
|
||||
|
||||
How to Install
|
||||
--------------
|
||||
|
||||
|
@ -37,6 +39,7 @@ How to Install
|
|||
|
||||
$ pdf2txt.py samples/simple1.pdf
|
||||
|
||||
|
||||
For CJK Languages
|
||||
-----------------
|
||||
|
||||
|
@ -60,6 +63,7 @@ paste the following commands on a command line prompt:
|
|||
python tools\conv_cmap.py -c KSC-EUC=euc-kr -c KSC-Johab=johab -c KSCms-UHC=cp949 -c UniKS-UTF8=utf-8 pdfminer\cmap Adobe-Korea1 cmaprsrc\cid2code_Adobe_Korea1.txt
|
||||
python setup.py install
|
||||
|
||||
|
||||
Command Line Tools
|
||||
------------------
|
||||
|
||||
|
@ -87,6 +91,21 @@ but it's also possible to extract some meaningful contents (e.g. images).
|
|||
|
||||
(For details, refer to the html document.)
|
||||
|
||||
|
||||
API Changes
|
||||
-----------
|
||||
|
||||
As of November 2013, there were a few changes made to the PDFMiner API
|
||||
prior to October 2013. This is the result of code restructuring. Here
|
||||
is a list of the changes:
|
||||
|
||||
* PDFDocument class is moved to pdfdocument.py.
|
||||
* PDFDocument class now takes a PDFParser object as an argument.
|
||||
PDFDocument.set_parser() and PDFParser.set_document() is removed.
|
||||
* PDFPage class is moved to pdfpage.py
|
||||
* process_pdf function is implemented as a class method PDFPage.get_pages.
|
||||
|
||||
|
||||
TODO
|
||||
----
|
||||
|
||||
|
@ -97,6 +116,7 @@ TODO
|
|||
* Better documentation.
|
||||
* Crypt stream filter support.
|
||||
|
||||
|
||||
Related Projects
|
||||
----------------
|
||||
|
||||
|
@ -105,6 +125,7 @@ Related Projects
|
|||
* <a href="http://www.pdfbox.org/">pdfbox</a>
|
||||
* <a href="http://mupdf.com/">mupdf</a>
|
||||
|
||||
|
||||
Terms and Conditions
|
||||
--------------------
|
||||
|
||||
|
|
|
@ -9,7 +9,7 @@
|
|||
|
||||
<div align=right class=lastmod>
|
||||
<!-- hhmts start -->
|
||||
Last Modified: Sat Oct 26 15:03:35 UTC 2013
|
||||
Last Modified: Sun Nov 17 06:32:44 UTC 2013
|
||||
<!-- hhmts end -->
|
||||
</div>
|
||||
|
||||
|
@ -368,7 +368,18 @@ no stream header is displayed for the ease of saving it to a file.
|
|||
|
||||
<h2><a name="changes">Changes</a></h2>
|
||||
<ul>
|
||||
<li> 2013/10/22: Sudden resurge of interests.
|
||||
<li> 2013/11/13: Bugfixes and minor improvements.<br>
|
||||
As of November 2013, there were a few changes made to the PDFMiner API
|
||||
prior to October 2013. This is the result of code restructuring. Here
|
||||
is a list of the changes:
|
||||
<ul>
|
||||
<li> <code>PDFDocument</code> class is moved to <code>pdfdocument.py</code>.
|
||||
<li> <code>PDFDocument</code> class now takes a <code>PDFParser</code> object as an argument.
|
||||
<li> <code>PDFDocument.set_parser()</code> and <code>PDFParser.set_document()</code> is removed.
|
||||
<li> <code>PDFPage</code> class is moved to <code>pdfpage.py</code>.
|
||||
<li> <code>process_pdf</code> function is implemented as <code>PDFPage.get_pages</code>.
|
||||
</ul>
|
||||
<li> 2013/10/22: Sudden resurge of interests. API changes.
|
||||
Incorporated a lot of patches and robust handling of broken PDFs.
|
||||
<li> 2011/05/15: Speed improvements for layout analysis.
|
||||
<li> 2011/05/15: API changes. <code>LTText.get_text()</code> is added.
|
||||
|
|
Loading…
Reference in New Issue