7.5 KiB

Raw Blame History

Changelog

All notable changes in pdfminer.six will be documented in this file.

The format is based on Keep a Changelog.

[Unreleased]

Added

Option to disable boxes flow layout analysis when using pdf2txt (#479)
Support for pathlib.PurePath in open_filename (#491)

Fixed

Pass caching parameter to PDFResourceManager in high_level functions (#475)
Fix out-of-bound access on some PDFs (#483)

Removed

Remove unused rijndael encryption implementation (#465)

[20200726]

Fixed

Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change (#461)
Always try to get CMap, not only for identity encodings (#438)

[20200720]

Added

Support for painting multiple rectangles at once (#371)

Fixed

Validate image object in do_EI is a PDFStream (#451)

Changed

Hiding fallback xref by default from dumppdf.py output (#431)
Raise a warning instead of an error when extracting text from a non-extractable PDF (#350)
Switched from pycryptodome to cryptography package for AES decryption (#456)

[20200517]

Added

Python3 shebang line to script in tools (#408

Fixed

Fix ordering of textlines within a textbox when boxes_flow=None (#411)

[20200402]

Added

Allow boxes_flow LAParam to be passed as None, validate the input, and update documentation (#395)
Also accept file-like objects in high level functions extract_text and extract_pages (#392)

Fixed

Text no longer comes in reverse order when advanced layout analysis is disabled (#398)
Updated misleading documentation for word_margin and char_margin (#407)
Ignore ValueError when converting font encoding differences (#389)
Grouping of text lines outside of parent container bounding box (#386)

Changed

Group text lines if they are centered (#382)

[20200124] - 2020-01-24

Security

Removed samples/issue-00152-embedded-pdf.pdf because it contains a possible security thread; a javascript enabled object (#364)

[20200121] - 2020-01-21

Fixed

Interpret two's complement integer as unsigned integer (#352)
Fix font name in html output such that it is recognized by browser (#357)
Compute correct font height by removing scaling with font bounding box height (#348)
KeyError when extracting embedded files and a Unicode file specification is missing (#338)

Removed

The command-line utility latin2ascii.py (#360)

[20200104] - 2019-01-04

Removed

Support for Python 2 (#346)

Changed

Enforce pep8 coding style by adding flake8 to CI (#345)

[20191110] - 2019-11-10

Fixed

Wrong order of text box grouping introduced by PR #315 (#335)

[20191107] - 2019-11-07

Deprecated

The argument _py2_no_more_posargs because Python2 is removed on January , 2020 (#328 and #307)

Added

Simple wrapper to easily extract text from a PDF file #330
Support for extracting JBIG2 encoded images (#311 and #46)
Sphinx documentation that is published on Read the Docs (#329)

Fixed

Unhandled AssertionError when dumping pdf containing reference to object id 0 (#318)
Debug flag actually changes logging level to debug for pdf2txt.py and dumppdf.py (#325)

Changed

Using argparse instead of getopt for command line interface of dumppdf.py (#321)
Refactor LTLayoutContainer.group_textboxes for a significant speed up in layout analysis (#315)

Removed

Files for external applications such as django, cgi and pyinstaller (#314)

[20191020] - 2019-10-20

Deprecated

Support for Python 2 is dropped at January 1st, 2020 (#307)

Added

Contribution guidelines in CONTRIBUTING.md (#259)
Support new encodings OneByteEncoding and DLIdent for CMaps (#283)

Fixed

Use six.iteritems() instead of dict().iteritems() to ensure Python2 and Python3 compatibility (#274)
Properly convert Adobe Glyph names to unicode characters (#263)
Allow CMap to be a content stream (#283)
Resolve indirect objects for width and bounding boxes for fonts (#273)
Actually updating stroke color in graphic state (#298)
Interpret (invalid) negative font descent as a positive descent (#203)
Correct colorspace comparision for images (#132)
Allow for bounding boxes with zero height or width by removing assertion (#246)

Changed

All dependencies are managed in setup.py (#306 and #219)

[20181108] - 2018-11-08

Changed

Speedup layout analysis (#141)
Use argparse instead of replace deprecated getopt (#173)
Allow pdfminer.six to be compiled with cython (#142)