documentation fix. oops
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@62 1aa58f4a-7d42-0410-adbc-911cccaed67cpull/1/head
parent
6b4044ddaf
commit
fd235369a9
25
README.html
25
README.html
|
@ -14,7 +14,7 @@ Python PDF parser and analyzer
|
||||||
|
|
||||||
<div align=right class=lastmod>
|
<div align=right class=lastmod>
|
||||||
<!-- hhmts start -->
|
<!-- hhmts start -->
|
||||||
Last Modified: Sat Jan 10 20:13:39 JST 2009
|
Last Modified: Sat Jan 10 20:18:36 JST 2009
|
||||||
<!-- hhmts end -->
|
<!-- hhmts end -->
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
@ -80,12 +80,17 @@ http://pdf2html.tabesugi.net:8080/
|
||||||
<li> Go to the <code>pdfminer</code> directory.
|
<li> Go to the <code>pdfminer</code> directory.
|
||||||
<li> Do the following test:<br>
|
<li> Do the following test:<br>
|
||||||
<blockquote><pre>
|
<blockquote><pre>
|
||||||
$ <strong>python -m tools.pdf2txt samples/simple1.pdf</strong>
|
$ <strong>python -m pdflib.pdf2txt samples/simple1.pdf</strong>
|
||||||
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ascii">
|
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ascii">
|
||||||
</head><body>
|
</head><body>
|
||||||
<div style="position:absolute; top:50px;"><a name="0">Page 0</a></div><span style="position:absolute; border: 1px solid gray; left:0px; top:50px; width:612px; height:792px;"></span>
|
<div style="position:absolute; top:50px;"><a name="1">Page 1</a></div><span style="position:absolute; border: 1px solid gray; left:0px; top:50px; width:612px; height:792px;"></span>
|
||||||
<span style="position:absolute; writing-mode:lr-tb; left:100px; top:122px; font-size:24px;"> Hello World </span>
|
<span style="position:absolute; writing-mode:lr-tb; left:100px; top:224px; font-size:22px;"> </span>
|
||||||
<div style="position:absolute; top:0px;">Page: <a href="#0">0</a></div>
|
<span style="position:absolute; writing-mode:lr-tb; left:106px; top:224px; font-size:22px;">Hello </span>
|
||||||
|
<span style="position:absolute; writing-mode:lr-tb; left:168px; top:224px; font-size:22px;">World </span>
|
||||||
|
<span style="position:absolute; writing-mode:lr-tb; left:100px; top:124px; font-size:22px;"> </span>
|
||||||
|
<span style="position:absolute; writing-mode:lr-tb; left:206px; top:124px; font-size:22px;">Hello </span>
|
||||||
|
<span style="position:absolute; writing-mode:lr-tb; left:368px; top:124px; font-size:22px;">World </span>
|
||||||
|
<div style="position:absolute; top:0px;">Page: <a href="#1">1</a></div>
|
||||||
</body></html>
|
</body></html>
|
||||||
</pre></blockquote>
|
</pre></blockquote>
|
||||||
<li> Done!
|
<li> Done!
|
||||||
|
@ -145,13 +150,13 @@ Unicode Standard.
|
||||||
<p>
|
<p>
|
||||||
Examples:
|
Examples:
|
||||||
<blockquote><pre>
|
<blockquote><pre>
|
||||||
$ <strong>python -m tools.pdf2txt -o output.html samples/naacl06-shinyama.pdf</strong>
|
$ <strong>python -m pdflib.pdf2txt -o output.html samples/naacl06-shinyama.pdf</strong>
|
||||||
(extract text as an HTML file whose filename is output.html)
|
(extract text as an HTML file whose filename is output.html)
|
||||||
|
|
||||||
$ <strong>python -m tools.pdf2txt -c euc-jp samples/jo.pdf</strong>
|
$ <strong>python -m pdflib.pdf2txt -c euc-jp samples/jo.pdf</strong>
|
||||||
(extract Japanese texts in vertical writing, CMap is required)
|
(extract Japanese texts in vertical writing, CMap is required)
|
||||||
|
|
||||||
$ <strong>python -m tools.pdf2txt -P mypassword secret.pdf</strong>
|
$ <strong>python -m pdflib.pdf2txt -P mypassword secret.pdf</strong>
|
||||||
(extract texts from an encrypted PDF file with a password)
|
(extract texts from an encrypted PDF file with a password)
|
||||||
</pre></blockquote>
|
</pre></blockquote>
|
||||||
|
|
||||||
|
@ -164,7 +169,7 @@ By default, it prints the extracted contents to stdout.
|
||||||
<p>
|
<p>
|
||||||
<dt> <code>-p <em>pageno[,pageno,...]</em></code>
|
<dt> <code>-p <em>pageno[,pageno,...]</em></code>
|
||||||
<dd> Speficies the comma-separated list of the page numbers to be extracted.
|
<dd> Speficies the comma-separated list of the page numbers to be extracted.
|
||||||
Page numbers are starting from zero.
|
Page numbers are starting from one.
|
||||||
By default, it extracts texts from all the pages.
|
By default, it extracts texts from all the pages.
|
||||||
<p>
|
<p>
|
||||||
<dt> <code>-c <em>codec</em></code>
|
<dt> <code>-c <em>codec</em></code>
|
||||||
|
@ -218,7 +223,7 @@ By default, it only prints the document trailer (like a header).
|
||||||
<dt> <code>-p <em>pageno</em></code>
|
<dt> <code>-p <em>pageno</em></code>
|
||||||
<dd> Speficies the page number to be extracted.
|
<dd> Speficies the page number to be extracted.
|
||||||
Multiple <code>-p</code> options are allowed.
|
Multiple <code>-p</code> options are allowed.
|
||||||
Note that page numbers start from zero.
|
Note that page numbers start from one.
|
||||||
<p>
|
<p>
|
||||||
<dt> <code>-r</code> (raw)
|
<dt> <code>-r</code> (raw)
|
||||||
<dt> <code>-b</code> (binary)
|
<dt> <code>-b</code> (binary)
|
||||||
|
|
Loading…
Reference in New Issue