html tidy up
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@257 1aa58f4a-7d42-0410-adbc-911cccaed67cpull/1/head
parent
98442ed943
commit
4f4f03fb2d
108
docs/index.html
108
docs/index.html
|
@ -5,9 +5,17 @@
|
||||||
<title>PDFMiner</title>
|
<title>PDFMiner</title>
|
||||||
<style type="text/css"><!--
|
<style type="text/css"><!--
|
||||||
blockquote { background: #eeeeee; }
|
blockquote { background: #eeeeee; }
|
||||||
|
h1 { border-bottom: solid black 2px; }
|
||||||
|
h2 { border-bottom: solid black 1px; }
|
||||||
--></style>
|
--></style>
|
||||||
</head><body>
|
</head><body>
|
||||||
|
|
||||||
|
<div align=right class=lastmod>
|
||||||
|
<!-- hhmts start -->
|
||||||
|
Last Modified: Sun Oct 17 09:10:34 UTC 2010
|
||||||
|
<!-- hhmts end -->
|
||||||
|
</div>
|
||||||
|
|
||||||
<h1>PDFMiner</h1>
|
<h1>PDFMiner</h1>
|
||||||
<p>
|
<p>
|
||||||
Python PDF parser and analyzer
|
Python PDF parser and analyzer
|
||||||
|
@ -17,31 +25,22 @@ Python PDF parser and analyzer
|
||||||
|
|
||||||
<a href="#changes">Recent Changes</a>
|
<a href="#changes">Recent Changes</a>
|
||||||
|
|
||||||
<div align=right class=lastmod>
|
|
||||||
<!-- hhmts start -->
|
|
||||||
Last Modified: Sun Oct 17 05:13:01 UTC 2010
|
|
||||||
<!-- hhmts end -->
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li> <a href="#intro">What's It?</a>
|
<li> <a href="#intro">What's It?</a>
|
||||||
<li> <a href="#source">Download</a>
|
<li> <a href="#download">Download</a>
|
||||||
<li> <a href="#install">Install</a>
|
<li> <a href="#install">How to Install</a>
|
||||||
<small>(<a href="#cmap">for CJK languages</a>)</small>
|
<small>(<a href="#cmap">for CJK languages</a>)</small>
|
||||||
<li> <a href="#usage">How to Use</a>
|
<li> <a href="#usage">How to Use</a>
|
||||||
<small>(<a href="#pdf2txt">pdf2txt.py</a>,
|
<small>(<a href="#pdf2txt">pdf2txt.py</a>,
|
||||||
<a href="#dumppdf">dumppdf.py</a>,
|
<a href="#dumppdf">dumppdf.py</a>,
|
||||||
<a href="programming.html">use as library</a>)</small>
|
<a href="programming.html">use as library</a>)</small>
|
||||||
<li> <a href="#techdocs">Technical Documents</a>
|
|
||||||
<li> <a href="#todos">TODOs</a>
|
<li> <a href="#todos">TODOs</a>
|
||||||
<li> <a href="#changes">Changes</a>
|
<li> <a href="#changes">Changes</a>
|
||||||
<li> <a href="#related">Related Projects</a>
|
<li> <a href="#related">Related Projects</a>
|
||||||
<li> <a href="#license">Terms and Conditions</a>
|
<li> <a href="#license">Terms and Conditions</a>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<a name="intro"></a>
|
<h2><a name="intro">What's It?</a></h2>
|
||||||
<hr noshade>
|
|
||||||
<h2>What's It?</h2>
|
|
||||||
<p>
|
<p>
|
||||||
PDFMiner is a tool for extracting information from PDF documents.
|
PDFMiner is a tool for extracting information from PDF documents.
|
||||||
Unlike other PDF-related tools, it focuses entirely on getting
|
Unlike other PDF-related tools, it focuses entirely on getting
|
||||||
|
@ -51,8 +50,9 @@ other information such as fonts or lines.
|
||||||
It includes a PDF converter that can transform PDF files
|
It includes a PDF converter that can transform PDF files
|
||||||
into other text formats (such as HTML). It has an extensible
|
into other text formats (such as HTML). It has an extensible
|
||||||
PDF parser that can be used for other purposes instead of text analysis.
|
PDF parser that can be used for other purposes instead of text analysis.
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<strong>Features:</strong>
|
<h3>Features</h3>
|
||||||
<ul>
|
<ul>
|
||||||
<li> Written entirely in Python. (for version 2.4 or newer)
|
<li> Written entirely in Python. (for version 2.4 or newer)
|
||||||
<li> Parse, analyze, and convert PDF documents.
|
<li> Parse, analyze, and convert PDF documents.
|
||||||
|
@ -66,29 +66,28 @@ PDF parser that can be used for other purposes instead of text analysis.
|
||||||
<li> Reconstruct the original layout by grouping text chunks.
|
<li> Reconstruct the original layout by grouping text chunks.
|
||||||
</ul>
|
</ul>
|
||||||
<p>
|
<p>
|
||||||
On the performance side,
|
|
||||||
PDFMiner is about 20 times slower than
|
PDFMiner is about 20 times slower than
|
||||||
other C/C++-based software such as XPdf.
|
other C/C++-based counterparts such as XPdf.
|
||||||
|
|
||||||
<a name="source"></a>
|
<h3><a name="download">Download</a></h3>
|
||||||
<p>
|
<p>
|
||||||
<strong>Download from PyPI:</strong><br>
|
<strong>Source distribution:</strong><br>
|
||||||
<a href="http://pypi.python.org/pypi/pdfminer/">
|
<a href="http://pypi.python.org/pypi/pdfminer/">
|
||||||
http://pypi.python.org/pypi/pdfminer/
|
http://pypi.python.org/pypi/pdfminer/
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
<P>
|
||||||
|
<strong>SVN repository:</strong><br>
|
||||||
|
<a href="http://code.google.com/p/pdfminerr/source/browse/trunk/pdfminer">
|
||||||
|
http://code.google.com/p/pdfminerr/source/browse/trunk/pdfminer
|
||||||
|
</a>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<strong>Discussion:</strong> (for questions and comments, post here)<br>
|
<strong>Discussion:</strong> (for questions and comments, post here)<br>
|
||||||
<a href="http://groups.google.com/group/pdfminer-users/">
|
<a href="http://groups.google.com/group/pdfminer-users/">
|
||||||
http://groups.google.com/group/pdfminer-users/
|
http://groups.google.com/group/pdfminer-users/
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
<P>
|
|
||||||
<strong>View the source:</strong><br>
|
|
||||||
<a href="http://code.google.com/p/pdfminerr/source/browse/trunk/pdfminer">
|
|
||||||
http://code.google.com/p/pdfminerr/source/browse/trunk/pdfminer
|
|
||||||
</a>
|
|
||||||
|
|
||||||
<P>
|
<P>
|
||||||
<strong>Online Demo:</strong> (pdf -> html conversion webapp)<br>
|
<strong>Online Demo:</strong> (pdf -> html conversion webapp)<br>
|
||||||
<a href="http://pdf2html.tabesugi.net:8080/">
|
<a href="http://pdf2html.tabesugi.net:8080/">
|
||||||
|
@ -96,13 +95,10 @@ http://pdf2html.tabesugi.net:8080/
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
|
||||||
<a name="install"></a>
|
<h2><a name="install">How to Install</a></h2>
|
||||||
<hr noshade>
|
|
||||||
<h2>Install</h2>
|
|
||||||
|
|
||||||
<ol>
|
<ol>
|
||||||
<li> Install <a href="http://www.python.org/download/">Python</a> 2.4 or newer.
|
<li> Install <a href="http://www.python.org/download/">Python</a> 2.4 or newer.
|
||||||
(<font color=red><strong>Python 3 is not supported.</strong></font>)
|
(<font color=red><strong>Python 3 is not supported.</strong></font>)
|
||||||
<li> Download the <a href="#source">PDFMiner source</a>.
|
<li> Download the <a href="#source">PDFMiner source</a>.
|
||||||
<li> Unpack it.
|
<li> Unpack it.
|
||||||
<li> Run <code>setup.py</code> to install:<br>
|
<li> Run <code>setup.py</code> to install:<br>
|
||||||
|
@ -131,9 +127,8 @@ W o r l d
|
||||||
<li> Done!
|
<li> Done!
|
||||||
</ol>
|
</ol>
|
||||||
|
|
||||||
|
<h3><a name="cmap">For CJK languages</a></h3>
|
||||||
<p>
|
<p>
|
||||||
<a name="cmap"></a>
|
|
||||||
<h3>For CJK languages</h3>
|
|
||||||
In order to process CJK languages, you need an additional step to take
|
In order to process CJK languages, you need an additional step to take
|
||||||
during installation:
|
during installation:
|
||||||
<blockquote><pre>
|
<blockquote><pre>
|
||||||
|
@ -146,6 +141,7 @@ writing 'CNS1_H.py'...
|
||||||
|
|
||||||
# <strong>python setup.py install</strong>
|
# <strong>python setup.py install</strong>
|
||||||
</pre></blockquote>
|
</pre></blockquote>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
On Windows machines which don't have <code>make</code> command,
|
On Windows machines which don't have <code>make</code> command,
|
||||||
paste the following commands on a command line prompt:
|
paste the following commands on a command line prompt:
|
||||||
|
@ -157,16 +153,12 @@ paste the following commands on a command line prompt:
|
||||||
<strong>python setup.py install</strong>
|
<strong>python setup.py install</strong>
|
||||||
</pre></blockquote>
|
</pre></blockquote>
|
||||||
|
|
||||||
<a name="usage"></a>
|
<h2><a name="usage">How to Use</a></h2>
|
||||||
<hr noshade>
|
|
||||||
<h2>How to Use</h2>
|
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
PDFMiner comes with two handy tools:
|
PDFMiner comes with two handy tools:
|
||||||
<code>pdf2txt.py</code> and <code>dumppdf.py</code>.
|
<code>pdf2txt.py</code> and <code>dumppdf.py</code>.
|
||||||
|
|
||||||
<a name="pdf2txt"></a>
|
<h3><a name="pdf2txt">pdf2txt.py</a></h3>
|
||||||
<h3>pdf2txt.py</h3>
|
|
||||||
<p>
|
<p>
|
||||||
<code>pdf2txt.py</code> extracts text contents from a PDF file.
|
<code>pdf2txt.py</code> extracts text contents from a PDF file.
|
||||||
It extracts all the texts that are to be rendered programmatically,
|
It extracts all the texts that are to be rendered programmatically,
|
||||||
|
@ -176,11 +168,12 @@ It also extracts the corresponding locations, font names, font sizes, writing
|
||||||
direction (horizontal or vertical) for each text portion.
|
direction (horizontal or vertical) for each text portion.
|
||||||
You need to provide a password for protected PDF documents when its access is restricted.
|
You need to provide a password for protected PDF documents when its access is restricted.
|
||||||
You cannot extract any text from a PDF document which does not have extraction permission.
|
You cannot extract any text from a PDF document which does not have extraction permission.
|
||||||
<p>
|
|
||||||
<strong>Note:</strong> Not all characters in a PDF can be safely converted to Unicode.
|
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Examples:
|
<strong>Note:</strong>
|
||||||
|
Not all characters in a PDF can be safely converted to Unicode.
|
||||||
|
|
||||||
|
<h4>Examples</h4>
|
||||||
<blockquote><pre>
|
<blockquote><pre>
|
||||||
$ <strong>pdf2txt.py -o output.html samples/naacl06-shinyama.pdf</strong>
|
$ <strong>pdf2txt.py -o output.html samples/naacl06-shinyama.pdf</strong>
|
||||||
(extract text as an HTML file whose filename is output.html)
|
(extract text as an HTML file whose filename is output.html)
|
||||||
|
@ -192,8 +185,7 @@ $ <strong>pdf2txt.py -P mypassword -o output.txt secret.pdf</strong>
|
||||||
(extract a text from an encrypted PDF file)
|
(extract a text from an encrypted PDF file)
|
||||||
</pre></blockquote>
|
</pre></blockquote>
|
||||||
|
|
||||||
<p>
|
<h4>Options</h4>
|
||||||
Options:
|
|
||||||
<dl>
|
<dl>
|
||||||
<dt> <code>-o <em>filename</em></code>
|
<dt> <code>-o <em>filename</em></code>
|
||||||
<dd> Specifies the output file name.
|
<dd> Specifies the output file name.
|
||||||
|
@ -286,16 +278,14 @@ By default, it extracts all the pages in a document.
|
||||||
<dd> Increases the debug level.
|
<dd> Increases the debug level.
|
||||||
</dl>
|
</dl>
|
||||||
|
|
||||||
<a name="dumppdf"></a>
|
<h3><a name="dumppdf">dumppdf.py</a></h3>
|
||||||
<h3>dumppdf.py</h3>
|
|
||||||
<p>
|
<p>
|
||||||
<code>dumppdf.py</code> dumps the internal contents of a PDF file
|
<code>dumppdf.py</code> dumps the internal contents of a PDF file
|
||||||
in pseudo-XML format. This program is primarily for debugging purposes,
|
in pseudo-XML format. This program is primarily for debugging purposes,
|
||||||
but it's also possible to extract some meaningful contents
|
but it's also possible to extract some meaningful contents
|
||||||
(such as images).
|
(such as images).
|
||||||
|
|
||||||
<p>
|
<h4>Examples</h4>
|
||||||
Examples:
|
|
||||||
<blockquote><pre>
|
<blockquote><pre>
|
||||||
$ <strong>dumppdf.py -a foo.pdf</strong>
|
$ <strong>dumppdf.py -a foo.pdf</strong>
|
||||||
(dump all the headers and contents, except stream objects)
|
(dump all the headers and contents, except stream objects)
|
||||||
|
@ -307,8 +297,7 @@ $ <strong>dumppdf.py -r -i6 foo.pdf > pic.jpeg</strong>
|
||||||
(extract a JPEG image)
|
(extract a JPEG image)
|
||||||
</pre></blockquote>
|
</pre></blockquote>
|
||||||
|
|
||||||
<p>
|
<h4>Options</h4>
|
||||||
Options:
|
|
||||||
<dl>
|
<dl>
|
||||||
<dt> <code>-a</code>
|
<dt> <code>-a</code>
|
||||||
<dd> Instructs to dump all the objects.
|
<dd> Instructs to dump all the objects.
|
||||||
|
@ -347,8 +336,7 @@ no stream header is displayed for the ease of saving it to a file.
|
||||||
<dd> Increases the debug level.
|
<dd> Increases the debug level.
|
||||||
</dl>
|
</dl>
|
||||||
|
|
||||||
<a name="library"></a>
|
<h3><a name="library">Use as Library</a></h3>
|
||||||
<h3>Use as Library</h3>
|
|
||||||
<p>
|
<p>
|
||||||
PDFMiner can be used as a library by other Python programs.
|
PDFMiner can be used as a library by other Python programs.
|
||||||
<p>
|
<p>
|
||||||
|
@ -356,21 +344,7 @@ For details, see the <a href="programming.html">Programming with PDFMiner</a> pa
|
||||||
<p>
|
<p>
|
||||||
Also, check out <a href="http://denis.papathanasiou.org/?p=343">a more complete example by Denis Papathanasiou</a>.
|
Also, check out <a href="http://denis.papathanasiou.org/?p=343">a more complete example by Denis Papathanasiou</a>.
|
||||||
|
|
||||||
<a name="techdocs"></a>
|
<h2><a name="todos">TODOs</a></h2>
|
||||||
<hr noshade>
|
|
||||||
<h2>Technical Documents</h2>
|
|
||||||
<p>
|
|
||||||
<ul>
|
|
||||||
<li> Video:
|
|
||||||
"How to Extract Text Contents from PDF by Hand"
|
|
||||||
<a href="http://www.youtube.com/watch?v=k34wRxaxA_c">(part 1)</a>
|
|
||||||
<a href="http://www.youtube.com/watch?v=_A1M4OdNsiQ">(part 2)</a>
|
|
||||||
<a href="http://www.youtube.com/watch?v=sfV_7cWPgZE">(part 3)</a>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<a name="todos"></a>
|
|
||||||
<hr noshade>
|
|
||||||
<h2>TODOs</h2>
|
|
||||||
<ul>
|
<ul>
|
||||||
<li> <A href="http://www.python.org/dev/peps/pep-0008/">PEP-8</a> and
|
<li> <A href="http://www.python.org/dev/peps/pep-0008/">PEP-8</a> and
|
||||||
<a href="http://www.python.org/dev/peps/pep-0257/">PEP-257</a> conformance.
|
<a href="http://www.python.org/dev/peps/pep-0257/">PEP-257</a> conformance.
|
||||||
|
@ -381,9 +355,7 @@ Also, check out <a href="http://denis.papathanasiou.org/?p=343">a more complete
|
||||||
<li> CCITTFax stream filter support.
|
<li> CCITTFax stream filter support.
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<a name="changes"></a>
|
<h2><a name="changes">Changes</a></h2>
|
||||||
<hr noshade>
|
|
||||||
<h2>Changes</h2>
|
|
||||||
<ul>
|
<ul>
|
||||||
<li> 2010/10/17: A couple of bugfixes and a minor improvement. Thanks to standardabweichung and Alastair Irving.
|
<li> 2010/10/17: A couple of bugfixes and a minor improvement. Thanks to standardabweichung and Alastair Irving.
|
||||||
<li> 2010/09/07: A minor bugfix. Thanks to Alexander Garden.
|
<li> 2010/09/07: A minor bugfix. Thanks to Alexander Garden.
|
||||||
|
@ -435,7 +407,6 @@ Also, check out <a href="http://denis.papathanasiou.org/?p=343">a more complete
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<a name="related"></a>
|
<a name="related"></a>
|
||||||
<hr noshade>
|
|
||||||
<h2>Related Projects</h2>
|
<h2>Related Projects</h2>
|
||||||
<ul>
|
<ul>
|
||||||
<li> <a href="http://pybrary.net/pyPdf/">pyPdf</a>
|
<li> <a href="http://pybrary.net/pyPdf/">pyPdf</a>
|
||||||
|
@ -445,7 +416,6 @@ Also, check out <a href="http://denis.papathanasiou.org/?p=343">a more complete
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<a name="license"></a>
|
<a name="license"></a>
|
||||||
<hr noshade>
|
|
||||||
<h2>Terms and Conditions</h2>
|
<h2>Terms and Conditions</h2>
|
||||||
<p>
|
<p>
|
||||||
(This is so-called MIT/X License)
|
(This is so-called MIT/X License)
|
||||||
|
|
|
@ -5,31 +5,38 @@
|
||||||
<title>Programming with PDFMiner</title>
|
<title>Programming with PDFMiner</title>
|
||||||
<style type="text/css"><!--
|
<style type="text/css"><!--
|
||||||
blockquote { background: #eeeeee; }
|
blockquote { background: #eeeeee; }
|
||||||
|
h1 { border-bottom: solid black 2px; }
|
||||||
|
h2 { border-bottom: solid black 1px; }
|
||||||
.comment { color: darkgreen; }
|
.comment { color: darkgreen; }
|
||||||
--></style>
|
--></style>
|
||||||
</head><body>
|
</head><body>
|
||||||
|
|
||||||
|
<div align=right class=lastmod>
|
||||||
|
<!-- hhmts start -->
|
||||||
|
Last Modified: Sun Oct 17 09:12:03 UTC 2010
|
||||||
|
<!-- hhmts end -->
|
||||||
|
</div>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<a href="index.html">[Back to PDFMiner homepage]</a>
|
<a href="index.html">[Back to PDFMiner homepage]</a>
|
||||||
|
|
||||||
<h1>Programming with PDFMiner</h1>
|
<h1>Programming with PDFMiner</h1>
|
||||||
<p>
|
<p>
|
||||||
This document explains how to use PDFMiner as a library
|
This page explains how to use PDFMiner as a library
|
||||||
from other applications.
|
from other applications.
|
||||||
<ul>
|
<ul>
|
||||||
<li> <a href="#overview">Overview</a>
|
<li> <a href="#overview">Overview</a>
|
||||||
<li> <a href="#basic">Basic Usage</a>
|
<li> <a href="#basic">Basic Usage</a>
|
||||||
<li> <a href="#layout">Layout Analysis</a>
|
<li> <a href="#layout">Layout Analysis</a>
|
||||||
<li> <a href="#toc">TOC Extraction</a>
|
<li> <a href="#tocextract">TOC Extraction</a>
|
||||||
<li> <a href="#more">more</a>
|
<li> <a href="#extend">Parser Extension</a>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<a name="overview">
|
<h2><a name="overview">Overview</a></h2>
|
||||||
<hr noshade>
|
|
||||||
<h2>Overview</h2>
|
|
||||||
<p>
|
<p>
|
||||||
<strong>PDF is evil.</strong> Although it is called a PDF
|
<strong>PDF is evil.</strong> Although it is called a PDF
|
||||||
"document", it's nothing like Word or HTML. PDF is more like a
|
"document", it's nothing like Word or HTML document. PDF is more
|
||||||
picture representation. PDF contents are just a bunch of
|
like a graphic representation. PDF contents are just a bunch of
|
||||||
instructions that tell how to place the stuff at each exact
|
instructions that tell how to place the stuff at each exact
|
||||||
position on a display or paper. In most cases, it has no logical
|
position on a display or paper. In most cases, it has no logical
|
||||||
structure such as sentences or paragraphs and it cannot adapt
|
structure such as sentences or paragraphs and it cannot adapt
|
||||||
|
@ -38,6 +45,13 @@ reconstruct some of those structures by guessing from its
|
||||||
positioning, but there's nothing guaranteed to work. Ugly, I
|
positioning, but there's nothing guaranteed to work. Ugly, I
|
||||||
know. Again, PDF is evil.
|
know. Again, PDF is evil.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
[More technical details about the internal structure of PDF:
|
||||||
|
"How to Extract Text Contents from PDF Manually"
|
||||||
|
<a href="http://www.youtube.com/watch?v=k34wRxaxA_c">(part 1)</a>
|
||||||
|
<a href="http://www.youtube.com/watch?v=_A1M4OdNsiQ">(part 2)</a>
|
||||||
|
<a href="http://www.youtube.com/watch?v=sfV_7cWPgZE">(part 3)</a>]
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Because a PDF file has such a big and complex structure,
|
Because a PDF file has such a big and complex structure,
|
||||||
parsing a PDF file as a whole is time and memory consuming. However,
|
parsing a PDF file as a whole is time and memory consuming. However,
|
||||||
|
@ -61,9 +75,7 @@ Figure 1 shows the relationship between the classes in PDFMiner.
|
||||||
<small>Figure 1. Relationships between PDFMiner classes</small>
|
<small>Figure 1. Relationships between PDFMiner classes</small>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<a name="basic">
|
<h2><a name="basic">Basic Usage</a></h2>
|
||||||
<hr noshade>
|
|
||||||
<h2>Basic Usage</h2>
|
|
||||||
<p>
|
<p>
|
||||||
A typical way to parse a PDF file is the following:
|
A typical way to parse a PDF file is the following:
|
||||||
<blockquote><pre>
|
<blockquote><pre>
|
||||||
|
@ -97,9 +109,7 @@ for page in doc.get_pages():
|
||||||
interpreter.process_page(page)
|
interpreter.process_page(page)
|
||||||
</pre></blockquote>
|
</pre></blockquote>
|
||||||
|
|
||||||
<a name="layout">
|
<h2><a name="layout">Accessing Layout Objects</a></h2>
|
||||||
<hr noshade>
|
|
||||||
<h2>Accessing Layout Objects</h2>
|
|
||||||
<p>
|
<p>
|
||||||
Here is a typical way to use the layout analysis function:
|
Here is a typical way to use the layout analysis function:
|
||||||
<blockquote><pre>
|
<blockquote><pre>
|
||||||
|
@ -174,9 +184,7 @@ Could be used for framing another pictures or figures.
|
||||||
<dd> Represents a polygon in a page.
|
<dd> Represents a polygon in a page.
|
||||||
</dl>
|
</dl>
|
||||||
|
|
||||||
<a name="toc">
|
<h2><a name="tocextract">TOC Extraction</a></h2>
|
||||||
<hr noshade>
|
|
||||||
<h2>TOC Extraction</h2>
|
|
||||||
<p>
|
<p>
|
||||||
PDFMiner provides functions to access the document's table of contents
|
PDFMiner provides functions to access the document's table of contents
|
||||||
("Outlines").
|
("Outlines").
|
||||||
|
@ -205,9 +213,7 @@ way to refer to any in-page object from the outside, there's no
|
||||||
way to tell exactly which part of text these destinations are
|
way to tell exactly which part of text these destinations are
|
||||||
refering to.
|
refering to.
|
||||||
|
|
||||||
<a name="more">
|
<h2><a name="extend">Parser Extension</a></h2>
|
||||||
<hr noshade>
|
|
||||||
<h2>More</h2>
|
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
You can extend <code>PDFPageInterpreter</code> and <code>PDFDevice</code> class
|
You can extend <code>PDFPageInterpreter</code> and <code>PDFDevice</code> class
|
||||||
|
|
Loading…
Reference in New Issue