Add section to documentation with howto for image extraction (#427)

* Make structure of documentation more clear: tutorials, how-to, topics and reference * Add howto for images * Restructure tutorials section, and add install section * Always use up-to-date version * Fix indentation warning in docstring * Add option to dumppdf.py and pdf2txt.py to show version Fixes #162
2020-05-17 17:48:06 +02:00 · 2020-05-17 17:48:06 +02:00 · 91d89af788
parent 7254530d27
commit 91d89af788
19 changed files with 123 additions and 35 deletions
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -12,10 +12,12 @@
 import os
 import sys
 import pdfminer
 sys.path.insert(0, os.path.join(
    os.path.abspath(os.path.dirname(__file__)), '../../'))
 # -- Project information -----------------------------------------------------
 project = 'pdfminer.six'
@ -23,7 +25,7 @@ copyright = '2019, Yusuke Shinyama, Philippe Guglielmetti & Pieter Marsman'
 author = 'Yusuke Shinyama, Philippe Guglielmetti & Pieter Marsman'
 # The full version, including alpha/beta/rc tags
-release = '20191020'
+release = pdfminer.__version__
 # -- General configuration ---------------------------------------------------
--- a/docs/source/howto/images.rst
+++ b/docs/source/howto/images.rst
@ -0,0 +1,19 @@
 .. _images:
 How to extract images from a PDF
 ********************************
 Before you start, make sure you have :ref:`installed pdfminer.six<install>`.
 The second thing you need is a PDF with images. If you don't have one,
 you can download `this research paper
 <https://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf>`_
 with images of cats and dogs and save it as `example.pdf`::
    $ curl https://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf --output example.pdf
 Then run the :ref:`pdf2txt<api_pdf2txt>` command::
    $ pdf2txt.py example.pdf --output-dir cats-and-dogs
 This command extracts all the images from the PDF and saves them into the
 `cats-and-dogs` directory.
--- a/docs/source/howto/index.rst
+++ b/docs/source/howto/index.rst
@ -0,0 +1,11 @@
 .. _howto:
 How-to guides
 *************
 How-to guides help you to solve specific problems with pdfminer.six.
 .. toctree::
    :maxdepth: 1
    images
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -21,12 +21,23 @@ Check out the source on `github <https://github.com/pdfminer/pdfminer.six>`_.
 Content
 =======
 This documentation is organized into four sections (according to the `Divio
 documentation system <https://documentation.divio.com>`_). The
 :ref:`tutorial` section helps you setup and use pdfminer.six for the first
 time. Read this section if this is your first time working with pdfminer.six.
 The :ref:`howto` offers specific recipies for solving common problems.
 Take a look at the :ref:`topic` if you want more background information on
 how pdfminer.six works internally. The :ref:`reference` provides
 detailed api documentation for all the common classes and functions in
 pdfminer.six.
 .. toctree::
    :maxdepth: 2
-    tutorials/index
+    tutorial/index
-    topics/index
+    howto/index
-    api/index
+    topic/index
    reference/index
 Features
@ -53,16 +64,6 @@ Before using it, you must install it using Python 3.4 or newer.
    $ pip install pdfminer.six
 Common use-cases
 ----------------
 * :ref:`tutorial_commandline` if you just want to extract text from a pdf once.
 * :ref:`tutorial_highlevel` if you want to integrate pdfminer.six with your
  Python code.
 * :ref:`tutorial_composable` when you want to tailor the behavior of
  pdfmine.six to your needs.
 Contributing
 ============
--- a/docs/source/reference/commandline.rst
+++ b/docs/source/reference/commandline.rst
--- a/docs/source/reference/composable.rst
+++ b/docs/source/reference/composable.rst
--- a/docs/source/reference/highlevel.rst
+++ b/docs/source/reference/highlevel.rst
--- a/docs/source/reference/index.rst
+++ b/docs/source/reference/index.rst
@ -1,5 +1,7 @@
-API documentation
+.. _reference:
-*****************
+
 API Reference
 *************
 .. toctree::
    :maxdepth: 2
--- a/docs/source/topics/converting_pdf_to_text.rst
+++ b/docs/source/topics/converting_pdf_to_text.rst
--- a/docs/source/topics/index.rst
+++ b/docs/source/topics/index.rst
@ -1,5 +1,7 @@
-Using pdfminer.six
+.. _topic:
-******************
+
 Topics
 ******
 .. toctree::
    :maxdepth: 2
--- a/docs/source/tutorials/commandline.rst
+++ b/docs/source/tutorials/commandline.rst
@ -1,7 +1,7 @@
 .. _tutorial_commandline:
-Get started with command-line tools
+Extract text from a PDF using the commandline
-***********************************
+*********************************************
 pdfminer.six has several tools that can be used from the command line. The
 command-line tools are aimed at users that occasionally want to extract text
--- a/docs/source/tutorials/composable.rst
+++ b/docs/source/tutorials/composable.rst
@ -1,7 +1,7 @@
 .. _tutorial_composable:
-Get started using the composable components API
+Extract text from a PDF using Python - part 2
-***********************************************
+*********************************************
 The command line tools and the high-level API are just shortcuts for often
 used combinations of pdfminer.six components. You can use these components to
--- a/docs/source/tutorials/highlevel.rst
+++ b/docs/source/tutorials/highlevel.rst
@ -5,8 +5,8 @@
 .. _tutorial_highlevel:
-Get started using the high-level functions
+Extract text from a PDF using Python
-******************************************
+************************************
 The high-level API can be used to do common tasks.
--- a/docs/source/tutorial/index.rst
+++ b/docs/source/tutorial/index.rst
@ -0,0 +1,14 @@
 .. _tutorial:
 Tutorials
 *********
 Tutorials help you get started with specific parts of pdfminer.six.
 .. toctree::
    :maxdepth: 1
    install
    commandline
    highlevel
    composable
--- a/docs/source/tutorial/install.rst
+++ b/docs/source/tutorial/install.rst
@ -0,0 +1,39 @@
 .. _install:
 Install pdfminer.six as a Python package
 ****************************************
 To use pdfminer.six for the first time, you need to install the Python
 package in your Python environment.
 This tutorial requires you to have a system with a working Python and pip
 installation. If you don't have one and don't know how to install it, take a
 look at `The Hitchhiker's Guide to Python! <https://docs.python-guide.org/>`_.
 Install using pip
 =================
 Run the following command on the commandline to install pdfminer.six as a
 Python package::
    pip install pdfminer.six
 Test pdfminer.six installation
 ==============================
 You can test the pdfminer.six installation by importing it in Python.
 Open an interactive Python session from the commandline import pdfminer
 .six::
    >>> import pdfminer
    >>> print(pdfminer.__version__)  # doctest: +IGNORE_RESULT
    '<installed version>'
 Now you can use pdfminer.six as a Python package. But pdfminer.six also
 comes with a couple of useful commandline tools. To test if these tools are
 correctly installed, run the following on your commandline::
    $ pdf2txt.py --version
    pdfminer.six <installed version>
--- a/docs/source/tutorials/index.rst
+++ b/docs/source/tutorials/index.rst
@ -1,9 +0,0 @@
 Getting started
 ***************
 .. toctree::
    :maxdepth: 2
    commandline
    highlevel
    composable
--- a/pdfminer/high_level.py
+++ b/pdfminer/high_level.py
@ -23,7 +23,7 @@ def extract_text_to_fp(inf, outfp, output_type='text', codec='utf-8',
    Takes loads of optional arguments but the defaults are somewhat sane.
    Beware laparams: Including an empty LAParams is not the same as passing
-        None!
+    None!
    :param inf: a file-like object to read PDF structure from, such as a
        file handler (using the builtin `open()` function) or a `BytesIO`.
--- a/tools/dumppdf.py
+++ b/tools/dumppdf.py
@ -6,6 +6,7 @@ import re
 import sys
 from argparse import ArgumentParser
 import pdfminer
 from pdfminer.pdfdocument import PDFDocument, PDFNoOutlines
 from pdfminer.pdfpage import PDFPage
 from pdfminer.pdfparser import PDFParser
@ -243,6 +244,9 @@ def create_parser():
    parser.add_argument('files', type=str, default=None, nargs='+',
                        help='One or more paths to PDF files.')
    parser.add_argument(
        "--version", "-v", action="version",
        version="pdfminer.six v{}".format(pdfminer.__version__))
    parser.add_argument(
        '--debug', '-d', default=False, action='store_true',
        help='Use debug logging level.')
--- a/tools/pdf2txt.py
+++ b/tools/pdf2txt.py
@ -64,6 +64,9 @@ def maketheparser():
        "files", type=str, default=None, nargs="+",
        help="One or more paths to PDF files.")
    parser.add_argument(
        "--version", "-v", action="version",
        version="pdfminer.six v{}".format(pdfminer.__version__))
    parser.add_argument(
        "--debug", "-d", default=False, action="store_true",
        help="Use debug logging level.")