pdfminer.six/docs/source/howto/images.rst

.. _images:

How to extract images from a PDF
********************************

Before you start, make sure you have :ref:`installed pdfminer.six<install>`.
The second thing you need is a PDF with images. If you don't have one,
you can download `this research paper
<https://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf>`_
with images of cats and dogs and save it as `example.pdf`::

    $ curl https://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf --output example.pdf

Then run the :ref:`pdf2txt<api_pdf2txt>` command::

    $ pdf2txt.py example.pdf --output-dir cats-and-dogs

This command extracts all the images from the PDF and saves them into the
`cats-and-dogs` directory.
Add section to documentation with howto for image extraction (#427) * Make structure of documentation more clear: tutorials, how-to, topics and reference * Add howto for images * Restructure tutorials section, and add install section * Always use up-to-date version * Fix indentation warning in docstring * Add option to dumppdf.py and pdf2txt.py to show version Fixes #162 2020-05-17 15:48:06 +00:00			`.. _images:`

			`How to extract images from a PDF`
			`********************************`

			Before you start, make sure you have :ref:`installed pdfminer.six<install>`.
			`The second thing you need is a PDF with images. If you don't have one,`
			you can download `this research paper
			<https://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf>`_
			with images of cats and dogs and save it as `example.pdf`::

			`$ curl https://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf --output example.pdf`

			Then run the :ref:`pdf2txt<api_pdf2txt>` command::

			`$ pdf2txt.py example.pdf --output-dir cats-and-dogs`

			`This command extracts all the images from the PDF and saves them into the`
			`cats-and-dogs` directory.