2019-10-20 10:32:11 +00:00
# Changelog
All notable changes in pdfminer.six will be documented in this file.
2018-11-08 16:38:17 +00:00
2019-10-20 10:32:11 +00:00
The format is based on [Keep a Changelog ](https://keepachangelog.com/en/1.0.0/ ).
2020-10-24 13:55:22 +00:00
## [Unreleased]
2021-08-26 18:53:13 +00:00
### Added
- Support for Paeth PNG filter compression (predictor value = 4) ([#537](https://github.com/pdfminer/pdfminer.six/pull/537))
2020-10-25 13:34:45 +00:00
### Fixed
2021-08-29 19:32:14 +00:00
- `KeyError` when `'Encrypt'` but not `'ID'` present in `trailer` ([#594](https://github.com/pdfminer/pdfminer.six/pull/594))
2021-08-26 18:55:02 +00:00
- Fix issue of ValueError and KeyError rasied in PDFdocument and PDFparser ([#573](https://github.com/pdfminer/pdfminer.six/pull/574))
2020-10-25 13:34:45 +00:00
- Fix issue of TypeError: cannot unpack non-iterable PDFObjRef object, when unpacking the value of 'DW2' ([#529](https://github.com/pdfminer/pdfminer.six/pull/529))
2021-08-30 19:31:32 +00:00
- Fix `PermissionError` when creating temporary filepaths on windows when running tests ([#484](https://github.com/pdfminer/pdfminer.six/pull/484))
- Fix `AttributeError` when dumping a TOC with bytes destinations ([#600](https://github.com/pdfminer/pdfminer.six/pull/600))
2021-08-26 19:05:03 +00:00
- Fix issue of some Chinese characters can not be extracted correctly ([#593](https://github.com/pdfminer/pdfminer.six/pull/593))
2021-08-15 15:49:56 +00:00
- Detecting trailer correctly when surrounded with needless whitespace ([#535](https://github.com/pdfminer/pdfminer.six/pull/535))
2021-07-27 16:27:32 +00:00
- Fix `.paint_path` logic for handling single line segments and extracting point-on-curve positions of Beziér path commands ([#530](https://github.com/pdfminer/pdfminer.six/pull/530))
2020-10-25 13:34:45 +00:00
2020-10-25 11:34:51 +00:00
## Removed
- Support for Python 3.4 and 3.5 ([#522](https://github.com/pdfminer/pdfminer.six/pull/522))
2020-10-24 13:55:22 +00:00
- Unused dependency on `sortedcontainers` package ([#525](https://github.com/pdfminer/pdfminer.six/pull/525))
2020-10-25 13:37:12 +00:00
- Support for non-standard output streams that are not binary ([#523](https://github.com/pdfminer/pdfminer.six/pull/523))
2020-10-24 13:55:22 +00:00
2020-10-18 10:57:26 +00:00
## [20201018]
2020-09-10 17:28:00 +00:00
2020-10-10 14:15:03 +00:00
### Deprecated
2020-10-25 11:34:51 +00:00
- Support for Python 3.4 and 3.5 ([#507](https://github.com/pdfminer/pdfminer.six/pull/507))
2020-10-10 14:15:03 +00:00
2020-09-17 19:29:00 +00:00
### Added
2020-10-10 13:17:04 +00:00
- Option to disable boxes flow layout analysis when using pdf2txt ([#479](https://github.com/pdfminer/pdfminer.six/pull/479))
2020-10-25 11:34:51 +00:00
- Support for `pathlib.PurePath` in `open_filename` ([#492](https://github.com/pdfminer/pdfminer.six/pull/492))
2020-09-17 19:29:00 +00:00
2020-09-10 19:09:07 +00:00
### Fixed
- Pass caching parameter to PDFResourceManager in `high_level` functions ([#475](https://github.com/pdfminer/pdfminer.six/pull/475))
2020-10-25 11:34:51 +00:00
- Fix `.paint_path` logic for handling non-rect quadrilaterals and decomposing complex paths ([#512](https://github.com/pdfminer/pdfminer.six/pull/512))
2020-10-10 13:18:34 +00:00
- Fix out-of-bound access on some PDFs ([#483](https://github.com/pdfminer/pdfminer.six/pull/483))
2020-09-10 17:28:00 +00:00
2020-09-10 19:09:07 +00:00
### Removed
2020-09-10 17:28:00 +00:00
- Remove unused rijndael encryption implementation ([#465](https://github.com/pdfminer/pdfminer.six/pull/465))
2020-07-26 13:06:04 +00:00
## [20200726]
### Fixed
- Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change ([#461](https://github.com/pdfminer/pdfminer.six/pull/461))
2020-07-26 13:14:15 +00:00
- Always try to get CMap, not only for identity encodings ([#438](https://github.com/pdfminer/pdfminer.six/pull/438))
2020-07-26 13:06:04 +00:00
2020-07-20 20:05:19 +00:00
## [20200720]
2020-05-23 16:04:34 +00:00
2020-07-11 15:34:38 +00:00
### Added
- Support for painting multiple rectangles at once ([#371](https://github.com/pdfminer/pdfminer.six/pull/371))
2020-07-26 13:06:04 +00:00
### Fixed
2020-07-05 11:42:15 +00:00
- Validate image object in do_EI is a PDFStream ([#451](https://github.com/pdfminer/pdfminer.six/pull/451))
### Changed
2020-05-23 16:04:34 +00:00
- Hiding fallback xref by default from dumppdf.py output ([#431](https://github.com/pdfminer/pdfminer.six/pull/431))
2020-10-25 11:34:51 +00:00
- Raise a warning instead of an error when extracting text from a non-extractable PDF ([#453](https://github.com/pdfminer/pdfminer.six/pull/453))
2020-07-20 20:00:54 +00:00
- Switched from pycryptodome to cryptography package for AES decryption ([#456](https://github.com/pdfminer/pdfminer.six/pull/456))
2020-07-11 14:04:11 +00:00
2020-05-17 15:49:51 +00:00
## [20200517]
2020-04-28 08:58:42 +00:00
### Added
2020-10-25 11:34:51 +00:00
- Python3 shebang line to script in tools ([#408](https://github.com/pdfminer/pdfminer.six/pull/408))
2020-05-17 15:49:51 +00:00
2020-05-09 13:37:49 +00:00
### Fixed
2020-10-25 11:34:51 +00:00
- Fix ordering of textlines within a textbox when `boxes_flow=None` ([#412](https://github.com/pdfminer/pdfminer.six/pull/412))
2020-04-28 08:58:42 +00:00
2020-04-01 19:37:39 +00:00
## [20200402]
2019-10-20 10:32:11 +00:00
2020-03-26 22:03:49 +00:00
### Added
2020-10-25 11:34:51 +00:00
- Allow boxes_flow LAParam to be passed as None, validate the input, and update documentation ([#396](https://github.com/pdfminer/pdfminer.six/pull/396))
- Also accept file-like objects in high level functions `extract_text` and `extract_pages` ([#393](https://github.com/pdfminer/pdfminer.six/pull/393))
2020-03-26 22:03:49 +00:00
2020-03-14 09:33:39 +00:00
### Fixed
2020-10-25 11:34:51 +00:00
- Text no longer comes in reverse order when advanced layout analysis is disabled ([#399](https://github.com/pdfminer/pdfminer.six/pull/399))
2020-03-26 22:02:48 +00:00
- Updated misleading documentation for `word_margin` and `char_margin` ([#407](https://github.com/pdfminer/pdfminer.six/pull/407))
2020-03-16 19:12:45 +00:00
- Ignore ValueError when converting font encoding differences ([#389](https://github.com/pdfminer/pdfminer.six/pull/389))
2020-03-14 09:33:39 +00:00
- Grouping of text lines outside of parent container bounding box ([#386](https://github.com/pdfminer/pdfminer.six/pull/386))
2020-01-24 11:38:11 +00:00
2020-03-23 21:38:39 +00:00
### Changed
2020-10-25 11:34:51 +00:00
- Group text lines if they are centered ([#384](https://github.com/pdfminer/pdfminer.six/pull/384))
2020-03-23 21:38:39 +00:00
2020-01-24 11:38:11 +00:00
## [20200124] - 2020-01-24
2020-01-24 11:36:02 +00:00
### Security
- Removed samples/issue-00152-embedded-pdf.pdf because it contains a possible security thread; a javascript enabled object ([#364](https://github.com/pdfminer/pdfminer.six/pull/364))
2020-01-21 20:13:52 +00:00
## [20200121] - 2020-01-21
2020-01-07 20:59:13 +00:00
### Fixed
- Interpret two's complement integer as unsigned integer ([#352](https://github.com/pdfminer/pdfminer.six/pull/352))
2020-01-16 21:25:20 +00:00
- Fix font name in html output such that it is recognized by browser ([#357](https://github.com/pdfminer/pdfminer.six/pull/357))
2020-01-16 21:15:50 +00:00
- Compute correct font height by removing scaling with font bounding box height ([#348](https://github.com/pdfminer/pdfminer.six/pull/348))
2020-01-16 21:11:42 +00:00
- KeyError when extracting embedded files and a Unicode file specification is missing ([#338](https://github.com/pdfminer/pdfminer.six/pull/338))
2020-01-04 17:15:15 +00:00
2020-01-16 21:26:01 +00:00
### Removed
- The command-line utility latin2ascii.py ([#360](https://github.com/pdfminer/pdfminer.six/pull/360))
2020-01-04 17:15:15 +00:00
## [20200104] - 2019-01-04
2020-01-04 15:47:07 +00:00
## Removed
- Support for Python 2 ([#346](https://github.com/pdfminer/pdfminer.six/pull/346))
2019-12-29 20:20:20 +00:00
### Changed
- Enforce pep8 coding style by adding flake8 to CI ([#345](https://github.com/pdfminer/pdfminer.six/pull/345))
2019-11-10 11:29:14 +00:00
## [20191110] - 2019-11-10
2019-11-10 11:18:49 +00:00
### Fixed
- Wrong order of text box grouping introduced by PR #315 ([#335](https://github.com/pdfminer/pdfminer.six/pull/335))
2019-11-07 20:52:44 +00:00
## [20191107] - 2019-11-07
2019-11-02 09:29:39 +00:00
### Deprecated
- The argument `_py2_no_more_posargs` because Python2 is removed on January
, 2020 ([#328](https://github.com/pdfminer/pdfminer.six/pull/328) and
[#307 ](https://github.com/pdfminer/pdfminer.six/pull/307 ))
2019-10-22 15:37:06 +00:00
### Added
2019-11-07 06:54:10 +00:00
- Simple wrapper to easily extract text from a PDF file [#330 ](https://github.com/pdfminer/pdfminer.six/pull/330 )
2019-10-22 15:37:06 +00:00
- Support for extracting JBIG2 encoded images ([#311](https://github.com/pdfminer/pdfminer.six/pull/311) and [#46 ](https://github.com/pdfminer/pdfminer.six/pull/46 ))
2019-11-07 20:12:34 +00:00
- Sphinx documentation that is published on
[Read the Docs ](https://pdfminersix.readthedocs.io/ )
([#329](https://github.com/pdfminer/pdfminer.six/pull/329))
2019-10-20 12:21:48 +00:00
2019-10-25 20:49:58 +00:00
### Fixed
2019-11-06 20:51:34 +00:00
- Unhandled AssertionError when dumping pdf containing reference to object id 0
([#318](https://github.com/pdfminer/pdfminer.six/pull/318))
- Debug flag actually changes logging level to debug for pdf2txt.py and
dumppdf.py ([#325](https://github.com/pdfminer/pdfminer.six/pull/325))
2019-10-25 20:49:58 +00:00
2019-10-27 20:40:04 +00:00
### Changed
- Using argparse instead of getopt for command line interface of dumppdf.py ([#321](https://github.com/pdfminer/pdfminer.six/pull/321))
2019-10-31 08:22:58 +00:00
- Refactor `LTLayoutContainer.group_textboxes` for a significant speed up in layout analysis ([#315](https://github.com/pdfminer/pdfminer.six/pull/315))
2019-10-27 20:40:04 +00:00
2019-10-26 17:16:37 +00:00
### Removed
2020-10-25 11:34:51 +00:00
- Files for external applications such as django, cgi and pyinstaller ([#320](https://github.com/pdfminer/pdfminer.six/pull/320))
2019-10-26 17:16:37 +00:00
2019-10-20 12:21:48 +00:00
## [20191020] - 2019-10-20
2019-10-20 11:59:29 +00:00
### Deprecated
- Support for Python 2 is dropped at January 1st, 2020 ([#307](https://github.com/pdfminer/pdfminer.six/pull/307))
2019-10-20 10:32:11 +00:00
### Added
- Contribution guidelines in [CONTRIBUTING.md ](CONTRIBUTING.md ) ([#259](https://github.com/pdfminer/pdfminer.six/pull/259))
- Support new encodings OneByteEncoding and DLIdent for CMaps ([#283](https://github.com/pdfminer/pdfminer.six/pull/283))
### Fixed
- Use `six.iteritems()` instead of `dict().iteritems()` to ensure Python2 and Python3 compatibility ([#274](https://github.com/pdfminer/pdfminer.six/pull/274))
- Properly convert Adobe Glyph names to unicode characters ([#263](https://github.com/pdfminer/pdfminer.six/pull/263))
- Allow CMap to be a content stream ([#283](https://github.com/pdfminer/pdfminer.six/pull/283))
- Resolve indirect objects for width and bounding boxes for fonts ([#273](https://github.com/pdfminer/pdfminer.six/pull/273))
- Actually updating stroke color in graphic state ([#298](https://github.com/pdfminer/pdfminer.six/pull/298))
- Interpret (invalid) negative font descent as a positive descent ([#203](https://github.com/pdfminer/pdfminer.six/pull/203))
- Correct colorspace comparision for images ([#132](https://github.com/pdfminer/pdfminer.six/pull/132))
- Allow for bounding boxes with zero height or width by removing assertion ([#246](https://github.com/pdfminer/pdfminer.six/pull/246))
### Changed
2019-10-22 15:37:06 +00:00
- All dependencies are managed in `setup.py` ([#306](https://github.com/pdfminer/pdfminer.six/pull/306) and [#219 ](https://github.com/pdfminer/pdfminer.six/pull/219 ))
2019-10-20 10:32:11 +00:00
## [20181108] - 2018-11-08
### Changed
- Speedup layout analysis ([#141](https://github.com/pdfminer/pdfminer.six/pull/141))
- Use argparse instead of replace deprecated getopt ([#173](https://github.com/pdfminer/pdfminer.six/pull/173))
2020-04-01 19:37:39 +00:00
- Allow pdfminer.six to be compiled with cython ([#142](https://github.com/pdfminer/pdfminer.six/pull/142))