documentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@216 1aa58f4a-7d42-0410-adbc-911cccaed67cpull/1/head
parent
8e92ddca30
commit
479c920ec7
|
@ -25,6 +25,10 @@ from other applications.
|
||||||
<p>
|
<p>
|
||||||
A typical way to parse a PDF file is the following:
|
A typical way to parse a PDF file is the following:
|
||||||
<blockquote><pre>
|
<blockquote><pre>
|
||||||
|
from pdfminer.pdfparser import PDFParser, PDFDocument
|
||||||
|
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
|
||||||
|
from pdfminer.pdfdevice import PDFDevice
|
||||||
|
|
||||||
<span class="comment"># Open a PDF file.</span>
|
<span class="comment"># Open a PDF file.</span>
|
||||||
fp = open('mypdf.pdf', 'rb')
|
fp = open('mypdf.pdf', 'rb')
|
||||||
<span class="comment"># Create a PDF parser object associated with the file object.</span>
|
<span class="comment"># Create a PDF parser object associated with the file object.</span>
|
||||||
|
@ -34,7 +38,7 @@ doc = PDFDocument()
|
||||||
<span class="comment"># Connect the parser and document objects.</span>
|
<span class="comment"># Connect the parser and document objects.</span>
|
||||||
parser.set_document(doc)
|
parser.set_document(doc)
|
||||||
doc.set_parser(parser)
|
doc.set_parser(parser)
|
||||||
<span class="comment"># Supply the document password for initialization.</span>
|
<span class="comment"># Supply the password for initialization.</span>
|
||||||
<span class="comment"># (If no password is set, give an empty string.)</span>
|
<span class="comment"># (If no password is set, give an empty string.)</span>
|
||||||
doc.initialize(password)
|
doc.initialize(password)
|
||||||
<span class="comment"># Check if the document allows text extraction. If not, abort.</span>
|
<span class="comment"># Check if the document allows text extraction. If not, abort.</span>
|
||||||
|
@ -52,12 +56,12 @@ for page in doc.get_pages():
|
||||||
</pre></blockquote>
|
</pre></blockquote>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
In PDFMiner, there are several objects involved in parsing a PDF file.
|
In PDFMiner, there are several objects involved in parsing a PDF file,
|
||||||
Figure 1. shows the relationships between these objects.
|
as shown in Figure 1.
|
||||||
|
|
||||||
<div>
|
<div>
|
||||||
<img src="objrel.png"><br>
|
<img src="objrel.png"><br>
|
||||||
<small>Figure 1. Relationships between objects</small>
|
<small>Figure 1. Relationships between PDFMiner objects</small>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<a name="layout">
|
<a name="layout">
|
||||||
|
@ -67,6 +71,9 @@ Figure 1. shows the relationships between these objects.
|
||||||
PDFMiner performs a basic layout analysis.
|
PDFMiner performs a basic layout analysis.
|
||||||
|
|
||||||
<blockquote><pre>
|
<blockquote><pre>
|
||||||
|
from pdfminer.layout import LAParams
|
||||||
|
from pdfminer.converter import PDFPageAggregator
|
||||||
|
|
||||||
<span class="comment"># Set parameters for analysis.</span>
|
<span class="comment"># Set parameters for analysis.</span>
|
||||||
laparams = LAParams()
|
laparams = LAParams()
|
||||||
<span class="comment"># Create a PDF page aggregator object.</span>
|
<span class="comment"># Create a PDF page aggregator object.</span>
|
||||||
|
|
Loading…
Reference in New Issue