documentation

git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@216 1aa58f4a-7d42-0410-adbc-911cccaed67c
pull/1/head
yusuke.shinyama.dummy 2010-05-05 05:51:22 +00:00
parent 8e92ddca30
commit 479c920ec7
1 changed files with 11 additions and 4 deletions

View File

@ -25,6 +25,10 @@ from other applications.
<p> <p>
A typical way to parse a PDF file is the following: A typical way to parse a PDF file is the following:
<blockquote><pre> <blockquote><pre>
from pdfminer.pdfparser import PDFParser, PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfdevice import PDFDevice
<span class="comment"># Open a PDF file.</span> <span class="comment"># Open a PDF file.</span>
fp = open('mypdf.pdf', 'rb') fp = open('mypdf.pdf', 'rb')
<span class="comment"># Create a PDF parser object associated with the file object.</span> <span class="comment"># Create a PDF parser object associated with the file object.</span>
@ -34,7 +38,7 @@ doc = PDFDocument()
<span class="comment"># Connect the parser and document objects.</span> <span class="comment"># Connect the parser and document objects.</span>
parser.set_document(doc) parser.set_document(doc)
doc.set_parser(parser) doc.set_parser(parser)
<span class="comment"># Supply the document password for initialization.</span> <span class="comment"># Supply the password for initialization.</span>
<span class="comment"># (If no password is set, give an empty string.)</span> <span class="comment"># (If no password is set, give an empty string.)</span>
doc.initialize(password) doc.initialize(password)
<span class="comment"># Check if the document allows text extraction. If not, abort.</span> <span class="comment"># Check if the document allows text extraction. If not, abort.</span>
@ -52,12 +56,12 @@ for page in doc.get_pages():
</pre></blockquote> </pre></blockquote>
<p> <p>
In PDFMiner, there are several objects involved in parsing a PDF file. In PDFMiner, there are several objects involved in parsing a PDF file,
Figure 1. shows the relationships between these objects. as shown in Figure 1.
<div> <div>
<img src="objrel.png"><br> <img src="objrel.png"><br>
<small>Figure 1. Relationships between objects</small> <small>Figure 1. Relationships between PDFMiner objects</small>
</div> </div>
<a name="layout"> <a name="layout">
@ -67,6 +71,9 @@ Figure 1. shows the relationships between these objects.
PDFMiner performs a basic layout analysis. PDFMiner performs a basic layout analysis.
<blockquote><pre> <blockquote><pre>
from pdfminer.layout import LAParams
from pdfminer.converter import PDFPageAggregator
<span class="comment"># Set parameters for analysis.</span> <span class="comment"># Set parameters for analysis.</span>
laparams = LAParams() laparams = LAParams()
<span class="comment"># Create a PDF page aggregator object.</span> <span class="comment"># Create a PDF page aggregator object.</span>