documentation

git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@25 1aa58f4a-7d42-0410-adbc-911cccaed67c
pull/1/head
yusuke.shinyama.dummy 2008-04-27 11:55:51 +00:00
parent c07b6a8c40
commit 5b8874ff05
1 changed files with 28 additions and 13 deletions

View File

@ -1,17 +1,23 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html> <html>
<head> <head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>PDFMiner</title> <title>PDFMiner</title>
</head> <style type="text/css"><!--
blockquote { background: #eeeeee; }
--></style>
</head><body>
<body>
<h1>PDFMiner</h1> <h1>PDFMiner</h1>
<div align=right class=lastmod> <div align=right class=lastmod>
<!-- hhmts start --> <!-- hhmts start -->
Last Modified: Sun Apr 27 20:46:21 JST 2008 Last Modified: Sun Apr 27 20:54:51 JST 2008
<!-- hhmts end --> <!-- hhmts end -->
</div> </div>
<a name="intro"></a>
<hr noshade>
<h2>What's it?</h2>
<p> <p>
PDFMiner is a suite of programs that aims to help PDFMiner is a suite of programs that aims to help
extracting or analyzing text data from PDF documents. extracting or analyzing text data from PDF documents.
@ -49,7 +55,8 @@ http://www.unixuser.org/~euske/python/pdfminer/pdfminer-dist-20080427.tar.gz
http://pdfminerr.googlecode.com/svn/ http://pdfminerr.googlecode.com/svn/
</a> </a>
<hr> <a name="install"></a>
<hr noshade>
<h2>Installation</h2> <h2>Installation</h2>
<p> <p>
Prerequisite: <a href="http://www.python.org/download/">Python</a> 2.4 or newer. Prerequisite: <a href="http://www.python.org/download/">Python</a> 2.4 or newer.
@ -81,7 +88,10 @@ Here is how:
<a href="http://www.unixuser.org/~euske/pub/CMap.tar.bz2"> <a href="http://www.unixuser.org/~euske/pub/CMap.tar.bz2">
http://www.unixuser.org/~euske/pub/CMap.tar.bz2 http://www.unixuser.org/~euske/pub/CMap.tar.bz2
</a> </a>
<li> <code>$ tar jxf CMap.tar.bz2</code> <li> Do the follwoing:
<blockquote><pre>
$ <strong>tar jxf CMap.tar.bz2</strong>
</pre></blockquote>
<li> Put the <code>CMap</code> directory into the <code>pdfminer</code> directory. <li> Put the <code>CMap</code> directory into the <code>pdfminer</code> directory.
<li> Go to the <code>pdfminer</code> directory. <li> Go to the <code>pdfminer</code> directory.
<li> Do the follwoing: (this is optional but highly recommended)<br> <li> Do the follwoing: (this is optional but highly recommended)<br>
@ -90,13 +100,15 @@ $ <strong>make cdbcmap</strong>
</pre></blockquote> </pre></blockquote>
</ol> </ol>
<hr> <a name="usage"></a>
<hr noshade>
<h2>Usage</h2> <h2>Usage</h2>
<p> <p>
PDFMiner comes with two programs: PDFMiner comes with two programs:
<code>pdf2txt.py</code> and <code>dumppdf.py</code>. <code>pdf2txt.py</code> and <code>dumppdf.py</code>.
<a name="pdf2txt"></a>
<h3>pdf2txt.py</h3> <h3>pdf2txt.py</h3>
<p> <p>
<code>pdf2txt.py</code> extracts text contents from a PDF file. <code>pdf2txt.py</code> extracts text contents from a PDF file.
@ -149,6 +161,7 @@ By default, it extracts texts from all the pages.
<dd> Increases the debug level. <dd> Increases the debug level.
</dl> </dl>
<a name="dumppdf"></a>
<h3>dumppdf.py</h3> <h3>dumppdf.py</h3>
<p> <p>
<code>dumppdf.py</code> dumps the internal contents of a PDF file <code>dumppdf.py</code> dumps the internal contents of a PDF file
@ -199,16 +212,18 @@ no stream header is displayed for the ease of saving it to a file.
<dd> Increases the debug level. <dd> Increases the debug level.
</dl> </dl>
<hr> <a name="changes"></a>
<hr noshade>
<h2>Changes</h2> <h2>Changes</h2>
<ul> <ul>
<li> 2007/04/27: Basic encryption and LZW decoding support added. <li> 2007/04/27: Basic encryption and LZW decoding support added.
<li> 2007/01/07: Several bugfixes. Thanks to Nick Fabry for his contribution. <li> 2007/01/07: Several bugfixes. Thanks to Nick Fabry for his contribution.
<li> 2007/12/31: Initial release. <li> 2007/12/31: Initial release.
<li> 2004/12/24: Start writing the code... <li> 2004/12/24: Start writing the code out of boredom...
</ul> </ul>
<hr> <a name="related"></a>
<hr noshade>
<h2>Related Projects</h2> <h2>Related Projects</h2>
<ul> <ul>
<li> <a href="http://pybrary.net/pyPdf/">pyPdf</a> <li> <a href="http://pybrary.net/pyPdf/">pyPdf</a>
@ -216,8 +231,8 @@ no stream header is displayed for the ease of saving it to a file.
<li> <a href="http://www.pdfbox.org/">pdfbox</a> <li> <a href="http://www.pdfbox.org/">pdfbox</a>
</ul> </ul>
<a name="license"></a>
<hr> <hr noshade>
<h2>Terms and conditions</h2> <h2>Terms and conditions</h2>
<p> <p>
<small> <small>
@ -245,6 +260,6 @@ OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
</small> </small>
<hr> <hr noshade>
<address>Yusuke Shinyama</address> <address>Yusuke Shinyama</address>
</body> </body>