Updated and fixed the documents.

2013-11-13 14:51:24 +09:00 · 2013-11-13 14:51:24 +09:00 · 7504d2bf27
parent acad011e3f
commit 7504d2bf27
2 changed files with 31 additions and 29 deletions
--- a/docs/programming.html
+++ b/docs/programming.html
@ -9,7 +9,7 @@

 <div align=right class=lastmod>
 <!-- hhmts start -->
-Last Modified: Mon Nov 11 10:18:06 UTC 2013
+Last Modified: Wed Nov 13 05:50:56 UTC 2013
 <!-- hhmts end -->
 </div>

@ -23,9 +23,9 @@ from other applications.
 <ul>
 <li> <a href="#overview">Overview</a>
 <li> <a href="#basic">Basic Usage</a>
-<li> <a href="#layout">Layout Analysis</a>
-<li> <a href="#tocextract">TOC Extraction</a>
-<li> <a href="#extend">Parser Extension</a>
+<li> <a href="#layout">Performing Layout Analysis</a>
+<li> <a href="#tocextract">Obtaining Table of Contents</a>
+<li> <a href="#extend">Extending Functionality</a>
 </ul>

 <h2><a name="overview">Overview</a></h2>
@ -75,8 +75,12 @@ Figure 1 shows the relationship between the classes in PDFMiner.
 <p>
 A typical way to parse a PDF file is the following:
 <blockquote><pre>
-from pdfminer.pdfparser import PDFParser, PDFDocument
-from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
+from pdfminer.pdfparser import PDFParser
+from pdfminer.pdfdocument import PDFDocument
+from pdfminer.pdfpage import PDFPage
+from pdfminer.pdfpage import PDFTextExtractionNotAllowed
+from pdfminer.pdfinterp import PDFResourceManager
+from pdfminer.pdfinterp import PDFPageInterpreter
 from pdfminer.pdfdevice import PDFDevice

 <span class="comment"># Open a PDF file.</span>
@ -84,15 +88,12 @@ fp = open('mypdf.pdf', 'rb')
 <span class="comment"># Create a PDF parser object associated with the file object.</span>
 parser = PDFParser(fp)
 <span class="comment"># Create a PDF document object that stores the document structure.</span>
-doc = PDFDocument()
-<span class="comment"># Connect the parser and document objects.</span>
-parser.set_document(doc)
-doc.set_parser(parser)
+document = PDFDocument(parser)
 <span class="comment"># Supply the password for initialization.</span>
 <span class="comment"># (If no password is set, give an empty string.)</span>
-doc.initialize(password)
+document.initialize(password)
 <span class="comment"># Check if the document allows text extraction. If not, abort.</span>
-if not doc.is_extractable:
+if not document.is_extractable:
    raise PDFTextExtractionNotAllowed
 <span class="comment"># Create a PDF resource manager object that stores shared resources.</span>
 rsrcmgr = PDFResourceManager()
@ -101,11 +102,11 @@ device = PDFDevice(rsrcmgr)
 <span class="comment"># Create a PDF interpreter object.</span>
 interpreter = PDFPageInterpreter(rsrcmgr, device)
 <span class="comment"># Process each page contained in the document.</span>
-for page in doc.get_pages():
+for page in PDFPage.create_pages(document):
    interpreter.process_page(page)
 </pre></blockquote>

-<h2><a name="layout">Accessing Layout Objects</a></h2>
+<h2><a name="layout">Performing Layout Analysis</a></h2>
 <p>
 Here is a typical way to use the layout analysis function:
 <blockquote><pre>
@ -117,15 +118,15 @@ laparams = LAParams()
 <span class="comment"># Create a PDF page aggregator object.</span>
 device = PDFPageAggregator(rsrcmgr, laparams=laparams)
 interpreter = PDFPageInterpreter(rsrcmgr, device)
-for page in doc.get_pages():
+for page in PDFPage.create_pages(document):
    interpreter.process_page(page)
    <span class="comment"># receive the LTPage object for the page.</span>
    layout = device.get_result()
 </pre></blockquote>

-The layout analyzer gives a "<code>LTPage</code>" object for each page
-in the PDF document. The object contains child objects within the page,
-forming a tree-like structure. Figure 2 shows the relationship between
+A layout analyzer returns a <code>LTPage</code> object for each page
+in the PDF document. This object contains child objects within the page,
+forming a tree structure. Figure 2 shows the relationship between
 these objects.

 <div align=center>
@ -179,29 +180,29 @@ Could be used for separating text or figures.
 Could be used for framing another pictures or figures.

 <dt> <code>LTCurve</code>
-<dd> Represents a generic bezier curve.
+<dd> Represents a generic Bezier curve.
 </dl>

 <p>
 Also, check out <a href="http://denis.papathanasiou.org/?p=343">a more complete example by Denis Papathanasiou</a>.

-<h2><a name="tocextract">TOC Extraction</a></h2>
+<h2><a name="tocextract">Obtaining Table of Contents</a></h2>
 <p>
 PDFMiner provides functions to access the document's table of contents
 ("Outlines").

 <blockquote><pre>
-from pdfminer.pdfparser import PDFParser, PDFDocument
+from pdfminer.pdfparser import PDFParser
+from pdfminer.pdfdocument import PDFDocument

+<span class="comment"># Open a PDF document.</span>
 fp = open('mypdf.pdf', 'rb')
 parser = PDFParser(fp)
-doc = PDFDocument()
-parser.set_document(doc)
-doc.set_parser(parser)
-doc.initialize(password)
+document = PDFDocument(parser)
+document.initialize(password)

 <span class="comment"># Get the outlines of the document.</span>
-outlines = doc.get_outlines()
+outlines = document.get_outlines()
 for (level,title,dest,a,se) in outlines:
    print (level, title)
 </pre></blockquote>
@ -209,12 +210,12 @@ for (level,title,dest,a,se) in outlines:
 <p>
 Some PDF documents use page numbers as destinations, while others
 use page numbers and the physical location within the page. Since
-PDF does not have a logical strucutre, and it does not provide a
+PDF does not have a logical structure, and it does not provide a
 way to refer to any in-page object from the outside, there's no
 way to tell exactly which part of text these destinations are
-refering to.
+referring to.

-<h2><a name="extend">Parser Extension</a></h2>
+<h2><a name="extend">Extending Functionality</a></h2>

 <p>
 You can extend <code>PDFPageInterpreter</code> and <code>PDFDevice</code> class
--- a/docs/style.css
+++ b/docs/style.css
@ -1,3 +1,4 @@
 blockquote { background: #eeeeee; }
 h1 { border-bottom: solid black 2px; }
 h2 { border-bottom: solid black 1px; }
+.comment { color: darkgreen; }