documentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@25 1aa58f4a-7d42-0410-adbc-911cccaed67cpull/1/head
parent
c07b6a8c40
commit
5b8874ff05
41
README.html
41
README.html
|
@ -1,17 +1,23 @@
|
||||||
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
|
||||||
<html>
|
<html>
|
||||||
<head>
|
<head>
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||||
<title>PDFMiner</title>
|
<title>PDFMiner</title>
|
||||||
</head>
|
<style type="text/css"><!--
|
||||||
|
blockquote { background: #eeeeee; }
|
||||||
|
--></style>
|
||||||
|
</head><body>
|
||||||
|
|
||||||
<body>
|
|
||||||
<h1>PDFMiner</h1>
|
<h1>PDFMiner</h1>
|
||||||
|
|
||||||
<div align=right class=lastmod>
|
<div align=right class=lastmod>
|
||||||
<!-- hhmts start -->
|
<!-- hhmts start -->
|
||||||
Last Modified: Sun Apr 27 20:46:21 JST 2008
|
Last Modified: Sun Apr 27 20:54:51 JST 2008
|
||||||
<!-- hhmts end -->
|
<!-- hhmts end -->
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
<a name="intro"></a>
|
||||||
|
<hr noshade>
|
||||||
|
<h2>What's it?</h2>
|
||||||
<p>
|
<p>
|
||||||
PDFMiner is a suite of programs that aims to help
|
PDFMiner is a suite of programs that aims to help
|
||||||
extracting or analyzing text data from PDF documents.
|
extracting or analyzing text data from PDF documents.
|
||||||
|
@ -49,7 +55,8 @@ http://www.unixuser.org/~euske/python/pdfminer/pdfminer-dist-20080427.tar.gz
|
||||||
http://pdfminerr.googlecode.com/svn/
|
http://pdfminerr.googlecode.com/svn/
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
<hr>
|
<a name="install"></a>
|
||||||
|
<hr noshade>
|
||||||
<h2>Installation</h2>
|
<h2>Installation</h2>
|
||||||
<p>
|
<p>
|
||||||
Prerequisite: <a href="http://www.python.org/download/">Python</a> 2.4 or newer.
|
Prerequisite: <a href="http://www.python.org/download/">Python</a> 2.4 or newer.
|
||||||
|
@ -81,7 +88,10 @@ Here is how:
|
||||||
<a href="http://www.unixuser.org/~euske/pub/CMap.tar.bz2">
|
<a href="http://www.unixuser.org/~euske/pub/CMap.tar.bz2">
|
||||||
http://www.unixuser.org/~euske/pub/CMap.tar.bz2
|
http://www.unixuser.org/~euske/pub/CMap.tar.bz2
|
||||||
</a>
|
</a>
|
||||||
<li> <code>$ tar jxf CMap.tar.bz2</code>
|
<li> Do the follwoing:
|
||||||
|
<blockquote><pre>
|
||||||
|
$ <strong>tar jxf CMap.tar.bz2</strong>
|
||||||
|
</pre></blockquote>
|
||||||
<li> Put the <code>CMap</code> directory into the <code>pdfminer</code> directory.
|
<li> Put the <code>CMap</code> directory into the <code>pdfminer</code> directory.
|
||||||
<li> Go to the <code>pdfminer</code> directory.
|
<li> Go to the <code>pdfminer</code> directory.
|
||||||
<li> Do the follwoing: (this is optional but highly recommended)<br>
|
<li> Do the follwoing: (this is optional but highly recommended)<br>
|
||||||
|
@ -90,13 +100,15 @@ $ <strong>make cdbcmap</strong>
|
||||||
</pre></blockquote>
|
</pre></blockquote>
|
||||||
</ol>
|
</ol>
|
||||||
|
|
||||||
<hr>
|
<a name="usage"></a>
|
||||||
|
<hr noshade>
|
||||||
<h2>Usage</h2>
|
<h2>Usage</h2>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
PDFMiner comes with two programs:
|
PDFMiner comes with two programs:
|
||||||
<code>pdf2txt.py</code> and <code>dumppdf.py</code>.
|
<code>pdf2txt.py</code> and <code>dumppdf.py</code>.
|
||||||
|
|
||||||
|
<a name="pdf2txt"></a>
|
||||||
<h3>pdf2txt.py</h3>
|
<h3>pdf2txt.py</h3>
|
||||||
<p>
|
<p>
|
||||||
<code>pdf2txt.py</code> extracts text contents from a PDF file.
|
<code>pdf2txt.py</code> extracts text contents from a PDF file.
|
||||||
|
@ -149,6 +161,7 @@ By default, it extracts texts from all the pages.
|
||||||
<dd> Increases the debug level.
|
<dd> Increases the debug level.
|
||||||
</dl>
|
</dl>
|
||||||
|
|
||||||
|
<a name="dumppdf"></a>
|
||||||
<h3>dumppdf.py</h3>
|
<h3>dumppdf.py</h3>
|
||||||
<p>
|
<p>
|
||||||
<code>dumppdf.py</code> dumps the internal contents of a PDF file
|
<code>dumppdf.py</code> dumps the internal contents of a PDF file
|
||||||
|
@ -199,16 +212,18 @@ no stream header is displayed for the ease of saving it to a file.
|
||||||
<dd> Increases the debug level.
|
<dd> Increases the debug level.
|
||||||
</dl>
|
</dl>
|
||||||
|
|
||||||
<hr>
|
<a name="changes"></a>
|
||||||
|
<hr noshade>
|
||||||
<h2>Changes</h2>
|
<h2>Changes</h2>
|
||||||
<ul>
|
<ul>
|
||||||
<li> 2007/04/27: Basic encryption and LZW decoding support added.
|
<li> 2007/04/27: Basic encryption and LZW decoding support added.
|
||||||
<li> 2007/01/07: Several bugfixes. Thanks to Nick Fabry for his contribution.
|
<li> 2007/01/07: Several bugfixes. Thanks to Nick Fabry for his contribution.
|
||||||
<li> 2007/12/31: Initial release.
|
<li> 2007/12/31: Initial release.
|
||||||
<li> 2004/12/24: Start writing the code...
|
<li> 2004/12/24: Start writing the code out of boredom...
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<hr>
|
<a name="related"></a>
|
||||||
|
<hr noshade>
|
||||||
<h2>Related Projects</h2>
|
<h2>Related Projects</h2>
|
||||||
<ul>
|
<ul>
|
||||||
<li> <a href="http://pybrary.net/pyPdf/">pyPdf</a>
|
<li> <a href="http://pybrary.net/pyPdf/">pyPdf</a>
|
||||||
|
@ -216,8 +231,8 @@ no stream header is displayed for the ease of saving it to a file.
|
||||||
<li> <a href="http://www.pdfbox.org/">pdfbox</a>
|
<li> <a href="http://www.pdfbox.org/">pdfbox</a>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<a name="license"></a>
|
||||||
<hr>
|
<hr noshade>
|
||||||
<h2>Terms and conditions</h2>
|
<h2>Terms and conditions</h2>
|
||||||
<p>
|
<p>
|
||||||
<small>
|
<small>
|
||||||
|
@ -245,6 +260,6 @@ OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||||
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||||
</small>
|
</small>
|
||||||
|
|
||||||
<hr>
|
<hr noshade>
|
||||||
<address>Yusuke Shinyama</address>
|
<address>Yusuke Shinyama</address>
|
||||||
</body>
|
</body>
|
||||||
|
|
Loading…
Reference in New Issue