99 lines
2.5 KiB
HTML
99 lines
2.5 KiB
HTML
<html>
|
|
<head>
|
|
<title>PDFMiner</title>
|
|
</head>
|
|
|
|
<body>
|
|
<h1>PDFMiner</h1>
|
|
|
|
<p>
|
|
PDFMiner is a suite of programs that help
|
|
extracting or analyzing text data from PDF documents.
|
|
|
|
<p>
|
|
<strong>Homepage:</strong><br>
|
|
<a href="http://www.unixuser.org/~euske/python/pdfminer/index.html">
|
|
http://www.unixuser.org/~euske/python/pdfminer/index.html
|
|
</a>
|
|
|
|
<p>
|
|
<strong>Download:</strong><br>
|
|
<a href="http://www.unixuser.org/~euske/python/pdfminer/pdfminer-dist-20071231.tar.gz">
|
|
http://www.unixuser.org/~euske/python/pdfminer/pdfminer-dist-20071231.tar.gz
|
|
</a>
|
|
(220kbytes)
|
|
|
|
<P>
|
|
<strong>Svn repository:</strong><br>
|
|
<a href="http://pdfminerr.googlecode.com/svn/">
|
|
http://pdfminerr.googlecode.com/svn/
|
|
</a>
|
|
|
|
<hr>
|
|
<h2>Installation</h2>
|
|
|
|
<ol>
|
|
<li> Get
|
|
<a href="http://www.unixuser.org/~euske/pub/CMap.tar.bz2">
|
|
http://www.unixuser.org/~euske/pub/CMap.tar.bz2
|
|
</a>
|
|
<li> <code>$ tar jxf CMap.tar.bz2</code>
|
|
<li> <code>$ make cdbcmap</code>
|
|
</ol>
|
|
|
|
<hr>
|
|
<h2>Usage</h2>
|
|
|
|
<p>
|
|
<strong>Dump the contents:</strong>
|
|
<blockquote><pre>
|
|
$ ./dumppdf.py -a foo.pdf
|
|
</pre></blockquote>
|
|
|
|
<p>
|
|
<strong>Extract the text:</strong>
|
|
<blockquote><pre>
|
|
$ ./pdf2txt.py samples/naacl06-shinyama.pdf
|
|
$ ./pdf2txt.py -c euc-jp samples/jo.pdf
|
|
</pre></blockquote>
|
|
|
|
<hr>
|
|
<h2>Similar Projects</h2>
|
|
<ul>
|
|
<li> <a href="http://www.foolabs.com/xpdf/">xpdf</a>
|
|
<li> <a href="http://www.pdfbox.org/">pdfbox</a>
|
|
</ul>
|
|
|
|
|
|
<hr>
|
|
<h2>Terms and conditions</h2>
|
|
<p>
|
|
<small>
|
|
Copyright (c) 2004-2008 Yusuke Shinyama <yusuke at cs dot nyu dot edu>
|
|
<p>
|
|
Permission is hereby granted, free of charge, to any person
|
|
obtaining a copy of this software and associated documentation
|
|
files (the "Software"), to deal in the Software without
|
|
restriction, including without limitation the rights to use,
|
|
copy, modify, merge, publish, distribute, sublicense, and/or
|
|
sell copies of the Software, and to permit persons to whom the
|
|
Software is furnished to do so, subject to the following
|
|
conditions:
|
|
<p>
|
|
The above copyright notice and this permission notice shall be
|
|
included in all copies or substantial portions of the Software.
|
|
<p>
|
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
|
|
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
|
|
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
|
|
PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
|
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
|
|
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
|
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
</small>
|
|
|
|
<hr>
|
|
<address>Yusuke Shinyama</address>
|
|
</body>
|