From a1cae26a744ac89caf2e7c2d7fd7bda5c0c25296 Mon Sep 17 00:00:00 2001 From: Yusuke Shinyama Date: Wed, 23 Oct 2013 00:21:03 +0900 Subject: [PATCH] Documentation updated. --- README.md | 44 ++++++++++++++++++++------------------------ 1 file changed, 20 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index 8a2cef0..45ccb6c 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,5 @@ -## PDFMiner +PDFMiner +========== PDFMiner is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting @@ -9,8 +10,7 @@ It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis. - -** Features ** +**Features** * Written entirely in Python. * Parse, analyze, and convert PDF documents. @@ -22,41 +22,37 @@ PDF parser that can be used for other purposes than text analysis. * Tagged contents extraction. * Automatic layout analysis. - -** How to Install ** +**How to Install** * Install Python 2.4 or newer. (**Python 3 is not supported.**) * Download the source code. * Unpack it. * Run `setup.py`: - $ python setup.py install + $ python setup.py install * Do the following test: - $ pdf2txt.py samples/simple1.pdf + $ pdf2txt.py samples/simple1.pdf - -** For CJK Languages ** +**For CJK Languages** In order to process CJK languages, do the following before running setup.py install: - $ make cmap - python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt - reading 'cmaprsrc/cid2code_Adobe_CNS1.txt'... - writing 'CNS1_H.py'... - ... - $ python setup.py install + $ make cmap + python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt + reading 'cmaprsrc/cid2code_Adobe_CNS1.txt'... + writing 'CNS1_H.py'... + ... + $ python setup.py install -On Windows machines which don't have make command, +On Windows machines which don't have `make` command, paste the following commands on a command line prompt: - mkdir pdfminer\cmap - python tools\conv_cmap.py -c B5=cp950 -c UniCNS-UTF8=utf-8 pdfminer\cmap Adobe-CNS1 cmaprsrc\cid2code_Adobe_CNS1.txt - python tools\conv_cmap.py -c GBK-EUC=cp936 -c UniGB-UTF8=utf-8 pdfminer\cmap Adobe-GB1 cmaprsrc\cid2code_Adobe_GB1.txt - python tools\conv_cmap.py -c RKSJ=cp932 -c EUC=euc-jp -c UniJIS-UTF8=utf-8 pdfminer\cmap Adobe-Japan1 cmaprsrc\cid2code_Adobe_Japan1.txt - python tools\conv_cmap.py -c KSC-EUC=euc-kr -c KSC-Johab=johab -c KSCms-UHC=cp949 -c UniKS-UTF8=utf-8 pdfminer\cmap Adobe-Korea1 cmaprsrc\cid2code_Adobe_Korea1.txt - python setup.py install - - + mkdir pdfminer\cmap + python tools\conv_cmap.py -c B5=cp950 -c UniCNS-UTF8=utf-8 pdfminer\cmap Adobe-CNS1 cmaprsrc\cid2code_Adobe_CNS1.txt + python tools\conv_cmap.py -c GBK-EUC=cp936 -c UniGB-UTF8=utf-8 pdfminer\cmap Adobe-GB1 cmaprsrc\cid2code_Adobe_GB1.txt + python tools\conv_cmap.py -c RKSJ=cp932 -c EUC=euc-jp -c UniJIS-UTF8=utf-8 pdfminer\cmap Adobe-Japan1 cmaprsrc\cid2code_Adobe_Japan1.txt + python tools\conv_cmap.py -c KSC-EUC=euc-kr -c KSC-Johab=johab -c KSCms-UHC=cp949 -c UniKS-UTF8=utf-8 pdfminer\cmap Adobe-Korea1 cmaprsrc\cid2code_Adobe_Korea1.txt + python setup.py install