pdfminer.six/README.md

57 lines
1.9 KiB
Markdown
Raw Normal View History

pdfminer.six
2014-09-15 09:10:00 +00:00
============
2013-10-22 15:17:12 +00:00
[![Build Status](https://travis-ci.org/pdfminer/pdfminer.six.svg?branch=master)](https://travis-ci.org/pdfminer/pdfminer.six)
[![PyPI version](https://img.shields.io/pypi/v/pdfminer.six.svg)](https://pypi.python.org/pypi/pdfminer.six/)
[![gitter](https://badges.gitter.im/pdfminer-six/Lobby.svg)](https://gitter.im/pdfminer-six/Lobby?utm_source=badge&utm_medium)
2014-09-15 09:10:00 +00:00
2020-10-11 18:04:57 +00:00
*We fathom PDF*
2020-03-08 13:53:16 +00:00
Pdfminer.six is a community maintained fork of the original PDFMiner. It is a
tool for extracting information from PDF documents. It focuses on getting
and analyzing text data. Pdfminer.six extracts the text from a page directly
from the sourcecode of the PDF. It can also be used to get the exact location,
font or color of the text.
It is build in a modular way such that each component of pdfminer.six can be
2020-03-14 10:00:37 +00:00
replaced easily. You can implement your own interpreter or rendering device
2020-03-08 13:53:16 +00:00
to use the power of pdfminer.six for other purposes that text analysis.
2013-10-22 15:17:12 +00:00
Check out the full documentation on
[Read the Docs](https://pdfminersix.readthedocs.io).
2014-03-27 15:19:52 +00:00
2013-11-17 06:32:57 +00:00
2013-10-26 15:05:26 +00:00
Features
--------
2013-10-22 15:17:12 +00:00
* Written entirely in Python.
* Parse, analyze, and convert PDF documents.
* PDF-1.7 specification support. (well, almost).
2013-10-22 15:17:12 +00:00
* CJK languages and vertical writing scripts support.
* Various font types (Type1, TrueType, Type3, and CID) support.
* Support for extracting images (JPG, JBIG2 and Bitmaps).
* Support for RC4 and AES encryption.
Add section to documentation with howto for AcroForm fields extraction (#458) * Create aforms.rst Add section to documentation with howto for AcroForm fields extraction * Update index.rst Added reference to aforms.rst * Update aforms.rst * Update aforms.rst * Update index.rst * Update and rename aforms.rst to acro_forms.rst * Update acro_forms.rst * Update acro_forms.rst * Update acro_forms.rst * Update index.rst * Update acro_forms.rst * Update acro_forms.rst * Update acro_forms.rst * Update pdfdocument.py * Update pdfdocument.py * Update pdfdocument.py * Update acro_forms.rst * Update docs/source/howto/acro_forms.rst Co-authored-by: Jake Stockwin <jake.stockwin@optimorlabs.com> * Update docs/source/howto/acro_forms.rst Co-authored-by: Jake Stockwin <jake.stockwin@optimorlabs.com> * Update docs/source/howto/acro_forms.rst Co-authored-by: Jake Stockwin <jake.stockwin@optimorlabs.com> * Update acro_forms.rst * reverted changes * Update README.md * Proper processing of ComboBox ComboBox fields hold multiple values, so the must be returned as a list. * PDF with AcroForm (samples) * Create tmp * Delete AcroForm_TEST.pdf * Delete AcroForm_TEST_compiled.pdf * PDF file with AcroForms * Delete tmp * Fixed typo * Update index.rst * Update README.md * Update index.rst * Update pdfdocument.py * Update docs/source/howto/acro_forms.rst Co-authored-by: Jake Stockwin <jake.stockwin@optimorlabs.com> * Update pdfdocument.py * Update pdfdocument.py * Update pdfdocument.py Co-authored-by: Jake Stockwin <jake.stockwin@optimorlabs.com>
2020-09-10 17:18:41 +00:00
* Support for AcroForm interactive form extraction.
2020-03-08 13:53:16 +00:00
* Table of contents extraction.
2013-10-22 15:17:12 +00:00
* Tagged contents extraction.
* Automatic layout analysis.
2013-11-17 06:32:57 +00:00
How to use
----------
2013-10-22 15:17:12 +00:00
* Install Python 3.6 or newer.
* Install
2013-10-22 15:17:12 +00:00
2018-06-17 17:07:32 +00:00
`pip install pdfminer.six`
2013-10-22 15:17:12 +00:00
* Use command-line interface to extract text from pdf:
2013-10-26 15:05:26 +00:00
`python pdf2txt.py samples/simple1.pdf`
2013-10-26 15:05:26 +00:00
2013-11-17 06:32:57 +00:00
Contributing
------------
Be sure to read the [contribution guidelines](https://github.com/pdfminer/pdfminer.six/blob/master/CONTRIBUTING.md).