Community maintained fork of pdfminer - we fathom PDF
 
 
Go to file
yusuke.shinyama.dummy 5c1aa960f5 git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@18 1aa58f4a-7d42-0410-adbc-911cccaed67c 2008-02-03 11:47:24 +00:00
samples add samples, fixed silly bugs. 2007-12-31 05:02:15 +00:00
Makefile oops, version number bumped. 2008-01-07 14:15:29 +00:00
README.html git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@13 1aa58f4a-7d42-0410-adbc-911cccaed67c 2008-01-09 14:21:24 +00:00
cmap.py Restructuring core lexical handlings. 2008-02-03 09:36:34 +00:00
conv_afm.py initial import. 2007-12-30 09:13:51 +00:00
conv_cmap.py add samples, fixed silly bugs. 2007-12-31 05:02:15 +00:00
dumppdf.py yum-yum! 2008-01-07 13:47:52 +00:00
extent.py git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@14 1aa58f4a-7d42-0410-adbc-911cccaed67c 2008-01-09 14:40:04 +00:00
fontmetrics.py added license texts. 2007-12-31 04:10:03 +00:00
glyphlist.py added license texts. 2007-12-31 04:10:03 +00:00
latin_enc.py initial import. 2007-12-30 09:13:51 +00:00
pdf2txt.py Restructuring core lexical handlings. 2008-02-03 09:36:34 +00:00
pdfinterp.py git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@18 1aa58f4a-7d42-0410-adbc-911cccaed67c 2008-02-03 11:47:24 +00:00
pdfparser.py Restructuring core lexical handlings. 2008-02-03 09:36:34 +00:00
psparser.py git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@18 1aa58f4a-7d42-0410-adbc-911cccaed67c 2008-02-03 11:47:24 +00:00
pycdb.py documentation. 2007-12-31 04:40:27 +00:00
utils.py git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@13 1aa58f4a-7d42-0410-adbc-911cccaed67c 2008-01-09 14:21:24 +00:00

README.html

<html>
<head>
<title>PDFMiner</title>
</head>

<body>
<h1>PDFMiner</h1>

<p>
PDFMiner is a suite of programs that aims to help
extracting or analyzing text data from PDF documents.
Unlike other PDF-related tools, it allows to obtain
the exact location of texts in a page, as well as 
other layout information such as font size or font name,
which could be useful for analyzing the document.
PDFMiner is written purely in Python. It can be also used as a 
basis for a full-fledged PDF interpreter. 

<p>
<strong>Homepage:</strong><br>
<a href="http://www.unixuser.org/~euske/python/pdfminer/index.html">
http://www.unixuser.org/~euske/python/pdfminer/index.html
</a>

<p>
<strong>Download:</strong><br>
<a href="http://www.unixuser.org/~euske/python/pdfminer/pdfminer-dist-20080107.tar.gz">
http://www.unixuser.org/~euske/python/pdfminer/pdfminer-dist-20080107.tar.gz
</a>
(220kbytes)

<P>
<strong>Svn repository:</strong><br>
<a href="http://pdfminerr.googlecode.com/svn/">
http://pdfminerr.googlecode.com/svn/
</a>

<hr>
<h2>Installation</h2>

<ol>
<li> Get 
<a href="http://www.unixuser.org/~euske/pub/CMap.tar.bz2">
http://www.unixuser.org/~euske/pub/CMap.tar.bz2
</a>
<li> <code>$ tar jxf CMap.tar.bz2</code>
<li> <code>$ make cdbcmap</code>
</ol>

<hr>
<h2>Usage</h2>

<p>
<strong>Dump the contents in pseudo-XML:</strong>
<blockquote><pre>
$ ./dumppdf.py -a foo.pdf
</pre></blockquote>

<p>
<strong>Extract the text:</strong>
<blockquote><pre>
$ ./pdf2txt.py samples/naacl06-shinyama.pdf
$ ./pdf2txt.py -c euc-jp samples/jo.pdf
</pre></blockquote>

<hr>
<h2>Similar Projects</h2>
<ul>
<li> <a href="http://pybrary.net/pyPdf/">pyPdf</a>
<li> <a href="http://www.foolabs.com/xpdf/">xpdf</a>
<li> <a href="http://www.pdfbox.org/">pdfbox</a>
</ul>


<hr>
<h2>Terms and conditions</h2>
<p>
<small>
Copyright (c) 2004-2008  Yusuke Shinyama &lt;yusuke at cs dot nyu dot edu&gt;
<p>
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:
<p>
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
<p>
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
</small>

<hr>
<address>Yusuke Shinyama</address>
</body>