git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@200 1aa58f4a-7d42-0410-adbc-911cccaed67c
pull/1/head
yusuke.shinyama.dummy 2010-04-06 10:51:16 +00:00
parent 434720f767
commit e2e9adfaf3
1 changed files with 4 additions and 7 deletions

View File

@ -19,7 +19,7 @@ Python PDF parser and analyzer
<div align=right class=lastmod> <div align=right class=lastmod>
<!-- hhmts start --> <!-- hhmts start -->
Last Modified: Sun Mar 28 07:21:28 UTC 2010 Last Modified: Mon Apr 5 23:15:31 UTC 2010
<!-- hhmts end --> <!-- hhmts end -->
</div> </div>
@ -46,7 +46,7 @@ extracting some meaningful information out of PDF documents.
Unlike other PDF-related tools, it focuses entirely on getting Unlike other PDF-related tools, it focuses entirely on getting
and analyzing text data from PDFs. PDFMiner allows to obtain and analyzing text data from PDFs. PDFMiner allows to obtain
the exact location of texts in a page, as well as the exact location of texts in a page, as well as
other extra information such as font information or ruled lines. other information such as fonts or ruled lines.
It includes a PDF converter that can transform PDF files It includes a PDF converter that can transform PDF files
into other text formats (such as HTML). It has an extensible into other text formats (such as HTML). It has an extensible
PDF parser that can be used for other purposes instead of text analysis. PDF parser that can be used for other purposes instead of text analysis.
@ -131,11 +131,8 @@ W o r l d
<p> <p>
<a name="cmap"></a> <a name="cmap"></a>
<h3>For CJK languages</h3> <h3>For CJK languages</h3>
In order to handle CJK languages, In order to process CJK languages, you need an additional step to take
an additional data called <code>CMap</code> is required. during installation:
CMap files are not installed by default.
<p>
Here is the additional step you need to take:
<blockquote><pre> <blockquote><pre>
# <strong>make cmap</strong> # <strong>make cmap</strong>
python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt cp950 big5 python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt cp950 big5