documentation fix. oops

git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@62 1aa58f4a-7d42-0410-adbc-911cccaed67c
pull/1/head
yusuke.shinyama.dummy 2009-01-10 11:19:23 +00:00
parent 6b4044ddaf
commit fd235369a9
1 changed files with 15 additions and 10 deletions

View File

@ -14,7 +14,7 @@ Python PDF parser and analyzer
<div align=right class=lastmod> <div align=right class=lastmod>
<!-- hhmts start --> <!-- hhmts start -->
Last Modified: Sat Jan 10 20:13:39 JST 2009 Last Modified: Sat Jan 10 20:18:36 JST 2009
<!-- hhmts end --> <!-- hhmts end -->
</div> </div>
@ -80,12 +80,17 @@ http://pdf2html.tabesugi.net:8080/
<li> Go to the <code>pdfminer</code> directory. <li> Go to the <code>pdfminer</code> directory.
<li> Do the following test:<br> <li> Do the following test:<br>
<blockquote><pre> <blockquote><pre>
$ <strong>python -m tools.pdf2txt samples/simple1.pdf</strong> $ <strong>python -m pdflib.pdf2txt samples/simple1.pdf</strong>
&lt;html&gt;&lt;head&gt;&lt;meta http-equiv="Content-Type" content="text/html; charset=ascii"&gt; &lt;html&gt;&lt;head&gt;&lt;meta http-equiv="Content-Type" content="text/html; charset=ascii"&gt;
&lt;/head&gt;&lt;body&gt; &lt;/head&gt;&lt;body&gt;
&lt;div style="position:absolute; top:50px;"&gt;&lt;a name="0"&gt;Page 0&lt;/a&gt;&lt;/div&gt;&lt;span style="position:absolute; border: 1px solid gray; left:0px; top:50px; width:612px; height:792px;"&gt;&lt;/span&gt; &lt;div style="position:absolute; top:50px;"&gt;&lt;a name="1"&gt;Page 1&lt;/a&gt;&lt;/div&gt;&lt;span style="position:absolute; border: 1px solid gray; left:0px; top:50px; width:612px; height:792px;"&gt;&lt;/span&gt;
&lt;span style="position:absolute; writing-mode:lr-tb; left:100px; top:122px; font-size:24px;"&gt; Hello World &lt;/span&gt; &lt;span style="position:absolute; writing-mode:lr-tb; left:100px; top:224px; font-size:22px;"&gt; &lt;/span&gt;
&lt;div style="position:absolute; top:0px;"&gt;Page: &lt;a href="#0"&gt;0&lt;/a&gt;&lt;/div&gt; &lt;span style="position:absolute; writing-mode:lr-tb; left:106px; top:224px; font-size:22px;"&gt;Hello &lt;/span&gt;
&lt;span style="position:absolute; writing-mode:lr-tb; left:168px; top:224px; font-size:22px;"&gt;World &lt;/span&gt;
&lt;span style="position:absolute; writing-mode:lr-tb; left:100px; top:124px; font-size:22px;"&gt; &lt;/span&gt;
&lt;span style="position:absolute; writing-mode:lr-tb; left:206px; top:124px; font-size:22px;"&gt;Hello &lt;/span&gt;
&lt;span style="position:absolute; writing-mode:lr-tb; left:368px; top:124px; font-size:22px;"&gt;World &lt;/span&gt;
&lt;div style="position:absolute; top:0px;"&gt;Page: &lt;a href="#1"&gt;1&lt;/a&gt;&lt;/div&gt;
&lt;/body&gt;&lt;/html&gt; &lt;/body&gt;&lt;/html&gt;
</pre></blockquote> </pre></blockquote>
<li> Done! <li> Done!
@ -145,13 +150,13 @@ Unicode Standard.
<p> <p>
Examples: Examples:
<blockquote><pre> <blockquote><pre>
$ <strong>python -m tools.pdf2txt -o output.html samples/naacl06-shinyama.pdf</strong> $ <strong>python -m pdflib.pdf2txt -o output.html samples/naacl06-shinyama.pdf</strong>
(extract text as an HTML file whose filename is output.html) (extract text as an HTML file whose filename is output.html)
$ <strong>python -m tools.pdf2txt -c euc-jp samples/jo.pdf</strong> $ <strong>python -m pdflib.pdf2txt -c euc-jp samples/jo.pdf</strong>
(extract Japanese texts in vertical writing, CMap is required) (extract Japanese texts in vertical writing, CMap is required)
$ <strong>python -m tools.pdf2txt -P mypassword secret.pdf</strong> $ <strong>python -m pdflib.pdf2txt -P mypassword secret.pdf</strong>
(extract texts from an encrypted PDF file with a password) (extract texts from an encrypted PDF file with a password)
</pre></blockquote> </pre></blockquote>
@ -164,7 +169,7 @@ By default, it prints the extracted contents to stdout.
<p> <p>
<dt> <code>-p <em>pageno[,pageno,...]</em></code> <dt> <code>-p <em>pageno[,pageno,...]</em></code>
<dd> Speficies the comma-separated list of the page numbers to be extracted. <dd> Speficies the comma-separated list of the page numbers to be extracted.
Page numbers are starting from zero. Page numbers are starting from one.
By default, it extracts texts from all the pages. By default, it extracts texts from all the pages.
<p> <p>
<dt> <code>-c <em>codec</em></code> <dt> <code>-c <em>codec</em></code>
@ -218,7 +223,7 @@ By default, it only prints the document trailer (like a header).
<dt> <code>-p <em>pageno</em></code> <dt> <code>-p <em>pageno</em></code>
<dd> Speficies the page number to be extracted. <dd> Speficies the page number to be extracted.
Multiple <code>-p</code> options are allowed. Multiple <code>-p</code> options are allowed.
Note that page numbers start from zero. Note that page numbers start from one.
<p> <p>
<dt> <code>-r</code> (raw) <dt> <code>-r</code> (raw)
<dt> <code>-b</code> (binary) <dt> <code>-b</code> (binary)