Documentation updated.
parent
87842233b3
commit
86348eba2f
2
Makefile
2
Makefile
|
@ -27,7 +27,7 @@ sdist: distclean MANIFEST.in
|
||||||
register: distclean MANIFEST.in
|
register: distclean MANIFEST.in
|
||||||
$(PYTHON) setup.py sdist upload register
|
$(PYTHON) setup.py sdist upload register
|
||||||
|
|
||||||
WEBDIR=$$HOME/Site/unixuser.org/python/$(PACKAGE)
|
WEBDIR=$$HOME/work/Site/unixuser.org/python/$(PACKAGE)
|
||||||
publish:
|
publish:
|
||||||
$(CP) docs/*.html docs/*.png docs/*.css $(WEBDIR)
|
$(CP) docs/*.html docs/*.png docs/*.css $(WEBDIR)
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,62 @@
|
||||||
|
## PDFMiner
|
||||||
|
|
||||||
|
PDFMiner is a tool for extracting information from PDF documents.
|
||||||
|
Unlike other PDF-related tools, it focuses entirely on getting
|
||||||
|
and analyzing text data. PDFMiner allows one to obtain
|
||||||
|
the exact location of text in a page, as well as
|
||||||
|
other information such as fonts or lines.
|
||||||
|
It includes a PDF converter that can transform PDF files
|
||||||
|
into other text formats (such as HTML). It has an extensible
|
||||||
|
PDF parser that can be used for other purposes than text analysis.
|
||||||
|
|
||||||
|
|
||||||
|
** Features **
|
||||||
|
|
||||||
|
* Written entirely in Python.
|
||||||
|
* Parse, analyze, and convert PDF documents.
|
||||||
|
* PDF-1.7 specification support. (well, almost)
|
||||||
|
* CJK languages and vertical writing scripts support.
|
||||||
|
* Various font types (Type1, TrueType, Type3, and CID) support.
|
||||||
|
* Basic encryption (RC4) support.
|
||||||
|
* Outline (TOC) extraction.
|
||||||
|
* Tagged contents extraction.
|
||||||
|
* Automatic layout analysis.
|
||||||
|
|
||||||
|
|
||||||
|
** How to Install **
|
||||||
|
|
||||||
|
* Install Python 2.4 or newer. (**Python 3 is not supported.**)
|
||||||
|
* Download the source code.
|
||||||
|
* Unpack it.
|
||||||
|
* Run `setup.py`:
|
||||||
|
|
||||||
|
$ python setup.py install
|
||||||
|
|
||||||
|
* Do the following test:
|
||||||
|
|
||||||
|
$ pdf2txt.py samples/simple1.pdf
|
||||||
|
|
||||||
|
|
||||||
|
** For CJK Languages **
|
||||||
|
|
||||||
|
In order to process CJK languages, do the following before
|
||||||
|
running setup.py install:
|
||||||
|
|
||||||
|
$ make cmap
|
||||||
|
python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt
|
||||||
|
reading 'cmaprsrc/cid2code_Adobe_CNS1.txt'...
|
||||||
|
writing 'CNS1_H.py'...
|
||||||
|
...
|
||||||
|
$ python setup.py install
|
||||||
|
|
||||||
|
On Windows machines which don't have <code>make</code> command,
|
||||||
|
paste the following commands on a command line prompt:
|
||||||
|
|
||||||
|
mkdir pdfminer\cmap
|
||||||
|
python tools\conv_cmap.py -c B5=cp950 -c UniCNS-UTF8=utf-8 pdfminer\cmap Adobe-CNS1 cmaprsrc\cid2code_Adobe_CNS1.txt
|
||||||
|
python tools\conv_cmap.py -c GBK-EUC=cp936 -c UniGB-UTF8=utf-8 pdfminer\cmap Adobe-GB1 cmaprsrc\cid2code_Adobe_GB1.txt
|
||||||
|
python tools\conv_cmap.py -c RKSJ=cp932 -c EUC=euc-jp -c UniJIS-UTF8=utf-8 pdfminer\cmap Adobe-Japan1 cmaprsrc\cid2code_Adobe_Japan1.txt
|
||||||
|
python tools\conv_cmap.py -c KSC-EUC=euc-kr -c KSC-Johab=johab -c KSCms-UHC=cp949 -c UniKS-UTF8=utf-8 pdfminer\cmap Adobe-Korea1 cmaprsrc\cid2code_Adobe_Korea1.txt
|
||||||
|
python setup.py install
|
||||||
|
|
||||||
|
|
|
@ -1 +0,0 @@
|
||||||
See docs/index.html
|
|
|
@ -9,7 +9,7 @@
|
||||||
|
|
||||||
<div align=right class=lastmod>
|
<div align=right class=lastmod>
|
||||||
<!-- hhmts start -->
|
<!-- hhmts start -->
|
||||||
Last Modified: Tue Oct 22 13:19:10 UTC 2013
|
Last Modified: Tue Oct 22 15:16:49 UTC 2013
|
||||||
<!-- hhmts end -->
|
<!-- hhmts end -->
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
@ -139,7 +139,7 @@ In order to process CJK languages, you need an additional step to take
|
||||||
during installation:
|
during installation:
|
||||||
<blockquote><pre>
|
<blockquote><pre>
|
||||||
# <strong>make cmap</strong>
|
# <strong>make cmap</strong>
|
||||||
python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt cp950 big5
|
python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt
|
||||||
reading 'cmaprsrc/cid2code_Adobe_CNS1.txt'...
|
reading 'cmaprsrc/cid2code_Adobe_CNS1.txt'...
|
||||||
writing 'CNS1_H.py'...
|
writing 'CNS1_H.py'...
|
||||||
...
|
...
|
||||||
|
|
Loading…
Reference in New Issue