2019-10-20 10:32:11 +00:00
# Changelog
2022-03-21 18:20:42 +00:00
All notable changes in pdfminer.six will be documented in this file.
2018-11-08 16:38:17 +00:00
2019-10-20 10:32:11 +00:00
The format is based on [Keep a Changelog ](https://keepachangelog.com/en/1.0.0/ ).
2022-05-24 18:07:04 +00:00
## [Unreleased]
### Fixed
- Sphinx errors during building of documentation ([#760](https://github.com/pdfminer/pdfminer.six/pull/760))
## [20220524]
2022-03-21 18:20:42 +00:00
2022-03-21 18:25:28 +00:00
### Fixed
2022-03-21 18:20:42 +00:00
2022-05-06 20:15:00 +00:00
- Ignoring (invalid) path constructors that do not begin with `m` ([#749](https://github.com/pdfminer/pdfminer.six/pull/749))
2022-05-24 17:41:54 +00:00
### Changed
- Removed upper version bounds ([#755](https://github.com/pdfminer/pdfminer.six/pull/755))
2022-05-06 20:15:00 +00:00
## [20220506]
### Fixed
2022-03-21 18:39:53 +00:00
- `IndexError` when handling invalid bfrange code map in
CMap ([#731](https://github.com/pdfminer/pdfminer.six/pull/731))
2022-03-21 18:27:22 +00:00
- `TypeError` in lzw.py when `self.table` is not set ([#732](https://github.com/pdfminer/pdfminer.six/pull/732))
2022-03-21 18:25:28 +00:00
- `TypeError` in encodingdb.py when name of unicode is not
str ([#733](https://github.com/pdfminer/pdfminer.six/pull/733))
2022-03-21 18:20:42 +00:00
- `TypeError` in HTMLConverter when using a bytes fontname ([#734](https://github.com/pdfminer/pdfminer.six/pull/734))
2022-03-22 19:58:16 +00:00
### Added
- Exporting images without any specific encoding ([#737](https://github.com/pdfminer/pdfminer.six/pull/737))
2022-04-20 19:42:50 +00:00
### Changed
- Using charset-normalizer instead of chardet for less restrictive license ([#744](https://github.com/pdfminer/pdfminer.six/pull/744))
2022-03-19 19:49:22 +00:00
## [20220319]
2021-10-13 19:52:00 +00:00
### Added
2022-03-21 18:20:42 +00:00
2022-01-25 21:11:17 +00:00
- Export type annotations from pypi package per PEP561 ([#679](https://github.com/pdfminer/pdfminer.six/pull/679))
2021-10-13 19:52:00 +00:00
- Support for identity cmap's ([#626](https://github.com/pdfminer/pdfminer.six/pull/626))
2022-02-01 09:08:05 +00:00
- Add support for PDF page labels ([#680](https://github.com/pdfminer/pdfminer.six/pull/680))
2022-02-22 19:20:17 +00:00
- Installation of Pillow as an optional extra dependency ([#714](https://github.com/pdfminer/pdfminer.six/pull/714))
2021-10-13 19:52:00 +00:00
2021-12-11 17:25:19 +00:00
### Fixed
2022-03-21 18:20:42 +00:00
2021-12-11 17:25:19 +00:00
- Hande decompression error due to CRC checksum error ([#637](https://github.com/pdfminer/pdfminer.six/pull/637))
2022-01-26 18:55:08 +00:00
- Regression (since 20191107) in `LTLayoutContainer.group_textboxes` that returned some text lines out of order ([#659](https://github.com/pdfminer/pdfminer.six/pull/659))
2022-01-23 20:17:47 +00:00
- Add handling of JPXDecode filter to enable extraction of images for some pdfs ([#645](https://github.com/pdfminer/pdfminer.six/pull/645))
2022-01-23 20:41:08 +00:00
- Fix extraction of jbig2 files, which was producing invalid files ([#652](https://github.com/pdfminer/pdfminer.six/pull/653))
pdf2txt: clean up construction of LAParams from arguments (#682)
* Fix pdf2txt --boxes-flow=disabled
Fixes:
```
$ pdf2txt.py --boxes-flow=disabled test.pdf
Traceback (most recent call last):
File "tools/pdf2txt.py", line 204, in <module>
sys.exit(main())
File "tools/pdf2txt.py", line 198, in main
outfp = extract_text(**vars(A))
File "tools/pdf2txt.py", line 66, in extract_text
pdfminer.high_level.extract_text_to_fp(fp, **locals())
File "pdfminer/high_level.py", line 85, in extract_text_to_fp
interpreter.process_page(page)
File "pdfminer/pdfinterp.py", line 896, in process_page
self.device.end_page(page)
File "pdfminer/converter.py", line 51, in end_page
self.cur_item.analyze(self.laparams)
File "pdfminer/layout.py", line 822, in analyze
group.analyze(laparams)
File "pdfminer/layout.py", line 575, in analyze
LTTextGroup.analyze(self, laparams)
File "pdfminer/layout.py", line 362, in analyze
obj.analyze(laparams)
File "pdfminer/layout.py", line 575, in analyze
LTTextGroup.analyze(self, laparams)
File "pdfminer/layout.py", line 362, in analyze
obj.analyze(laparams)
File "pdfminer/layout.py", line 575, in analyze
LTTextGroup.analyze(self, laparams)
File "pdfminer/layout.py", line 362, in analyze
obj.analyze(laparams)
File "pdfminer/layout.py", line 577, in analyze
self._objs.sort(
File "pdfminer/layout.py", line 578, in <lambda>
key=lambda obj: (1 - laparams.boxes_flow) * obj.x0
TypeError: unsupported operand type(s) for -: 'int' and 'str'
```
Related: Issue #477, PR #479
* update CHANGELOG
* merge CHANGELOG
* pdf2txt: clean up handling of layout parameter arguments
* avoid specifying default values twice
* construct LAParams earlier, rather than passing its components around
* fix crash with --boxes_flow=disabled
* update CHANGELOG
* construct new LAParams, so _validate runs
* Improve readability of setting LAParams by explicitly copying them from parsed_args into init of LAParams. And move all parsed_args post processing to the parse_args() method.
* Add cli argument for line_overlap
* Also use default values from LAParams for --detect-vertical and --all-texts
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2022-01-25 21:06:06 +00:00
- Crash in `pdf2txt.py --boxes-flow=disabled` ([#682](https://github.com/pdfminer/pdfminer.six/pull/682))
2022-02-01 00:20:52 +00:00
- Only use xref fallback if `PDFNoValidXRef` is raised and `fallback` is True ([#684](https://github.com/pdfminer/pdfminer.six/pull/684))
2022-02-22 20:20:26 +00:00
- Ignore empty characters when analyzing layout ([#499](https://github.com/pdfminer/pdfminer.six/pull/499))
2022-02-01 00:20:52 +00:00
### Changed
- Replace warnings.warn with logging.Logger.warning in line with [recommended use ](https://docs.python.org/3/howto/logging.html#when-to-use-logging ) ([#673](https://github.com/pdfminer/pdfminer.six/pull/673))
2022-02-08 20:24:00 +00:00
- Switched from nose to pytest, from tox to nox and from Travis CI to GitHub Actions ([#704](https://github.com/pdfminer/pdfminer.six/pull/704))
2021-12-11 17:25:19 +00:00
2022-02-01 00:49:46 +00:00
### Removed
- Unnecessary return statements without argument at the end of functions ([#707](https://github.com/pdfminer/pdfminer.six/pull/707))
2021-10-12 18:45:24 +00:00
## [20211012]
2020-10-24 13:55:22 +00:00
2021-08-26 18:53:13 +00:00
### Added
2021-09-06 20:00:23 +00:00
- Add support for PDF 2.0 (ISO 32000-2) AES-256 encryption ([#614](https://github.com/pdfminer/pdfminer.six/pull/614))
2021-08-26 18:53:13 +00:00
- Support for Paeth PNG filter compression (predictor value = 4) ([#537](https://github.com/pdfminer/pdfminer.six/pull/537))
Add type annotations (#661)
Squashed commit of the following:
commit fa229f7b7591c07aea4e5a4545f9e0c34246e1cd
Merge: eaab3c6 c3e3499
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 20:33:06 2021 -0700
Merge branch 'develop' into mypy (and fixed types)
commit eaab3c65e2e3ab5f1f400cfc5186a3834c4ffe34
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 20:00:45 2021 -0700
reformat all multi-line function defs to one-arg-per-line
commit 3fe2b69eed9197009d9da6776462f580ebf0dfa3
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:58:48 2021 -0700
ccitt nit -- avoid casting needlessly
commit 15983d8c1e7162632fde43752c9d1c15938cd980
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:58:36 2021 -0700
tweak CHANGELOG
commit 13dc0babf782938e7d5b5e482d4c5adf92d82702
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:43:46 2021 -0700
add failing tests for dumppdf crash
commit 6b509c517876b8c15ac5a98a963884e23bd2e4d8
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:24:23 2021 -0700
ccitt: apply misc PR feedback
commit feb031ba86d3f22e41cfbbda13f17c039359f1e6
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:18:26 2021 -0700
add missing None return type to all __init__ methods
commit c0d62d6c54c7ec37b40bea54a3f6a7a618ec0ec6
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:13:08 2021 -0700
minor cleanup, remove a few more Any types
commit b52a0594e1998a492c172538a9b35491c5fc5f52
Author: Andrew Baumann <ab@ab.id.au>
Date: Sun Sep 5 22:37:28 2021 -0700
tighten up types, avoid Any in favour of explicit casts
commit e58fd48bd14f31bebd2de8259f12630ac02756d6
Author: Andrew Baumann <ab@ab.id.au>
Date: Sun Sep 5 14:10:49 2021 -0700
annotate ccitt.py, and fix one definite bug (array.tostring was renamed tobytes)
commit 605290633e55595e5e0045840df5c5b1d9de843a
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Sep 4 22:37:38 2021 -0700
python 3.7 back-compat
commit 4dbcf8760f8a1d3e3d99f085476f86e6a043c80c
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Sep 4 22:32:43 2021 -0700
annotate pdfminer.jbig2
commit 0d40b7c03a8028dc44acd3f457eac71abd681827
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Sep 4 22:31:33 2021 -0700
annotate pdf2txt.py
commit 5f82eb4f5646b5d1285252689191e0a14557ec7b
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Sep 4 09:16:31 2021 -0700
cleanup: make Plane generic
commit 624fc92b88473ff36a174760883f34c22109da2b
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 23:16:51 2021 -0700
bluntly ignore calls to cryptography.hazmat
commit 96b20439c169f40dbb114cabba6a582ad1ebe91e
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 23:01:06 2021 -0700
finish annotating, and disallow_untyped_defs for pdfminer.* _except_ ccitt and jbig2
commit 0ab586347861b72b1d16880dc9293f9ad597e20a
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 21:51:56 2021 -0700
annotate pdffont
commit 4b689f1bcbdaf654feb9de81023e318ca310a12e
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 18:30:02 2021 -0700
annotate a couple more scripts; document sketchy code
commit 291981ff3d273952ec9c92ef8ab948473558b787
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 15:02:01 2021 -0700
pacify flake8
commit 45d2ce91ff333f3b7e34322b16e9c52b99b7a972
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 14:31:48 2021 -0700
annotate dumppdf, and comment likely bugs
commit 7278d83851cb336a1be3803a0993b5ec0ad39b4c
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 13:49:58 2021 -0700
enable mypy on tests and tools, fix one implicit reexport bug
commit 4a83166ef4e4733cd2113f43188b585a4fda392b
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 13:25:59 2021 -0700
pdfdocument: per dumppdf.py, get_dest accepts either bytes or str
commit 43701e1bee068df98f378a253c9c2150ee4ad9f7
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 13:25:00 2021 -0700
layout: LAParams.boxes_flow may be None
commit 164f81652f1788e74837466f0ab593e94079bc0f
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 09:45:09 2021 -0700
add whitespace, pacify flake8
commit 893b9fb9ec918032b36a30456fc0b7a217da86d8
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 09:40:33 2021 -0700
support old Python without typing.Protocol
commit dc245084102b7b04c3f5599d75b5d62ba4290787
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 09:12:03 2021 -0700
Move "# type: ignore" comments to fix mypy on Python < 3.8
The placement of these comments got more flexible in 3.8 due to
https://github.com/python/mypy/issues/1032
Satisfying older Python and fitting in flake8's 79-character line
limit was quite a challenge!
commit da03afe7bd2cf3336e611f467f1c901455940ae8
Author: Andrew Baumann <ab@ab.id.au>
Date: Thu Sep 2 22:59:58 2021 -0700
fix text output from HTMLConverter
commit 5401276a2ed3b74a385ebcab5152485224146161
Author: Andrew Baumann <ab@ab.id.au>
Date: Thu Sep 2 22:40:22 2021 -0700
annotate high_level.py and the immediately-reachable internal APIs (mostly converters)
commit cc490513f8f17a7adc0bcbab2e0e86f37e832300
Author: Andrew Baumann <ab@ab.id.au>
Date: Thu Sep 2 17:04:35 2021 -0700
* expand and improve annotations in cmap, encryption/decompression and fonts
* disallow untyped calls; this way, we have a core set of
typed code that can grow over time
(just not for ccitt, because there's a ton of work lurking there)
* expand "typing: none" comments to suppress a specific error code
commit 92df54ba1d53d5dbbd5442757dd85be5b1851f99
Author: Andrew Baumann <ab@ab.id.au>
Date: Wed Sep 1 20:50:59 2021 -0700
update CHANGELOG
commit f72aaead45d0615e472a9b3190c9551a6b67b36e
Merge: ff787a9 8ea9f10
Author: Andrew Baumann <ab@ab.id.au>
Date: Wed Sep 1 20:47:03 2021 -0700
Merge branch 'develop' into mypy
commit ff787a93986c60361536a97182a41774f4a53ac3
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Aug 21 21:46:14 2021 -0700
be more precise about types on ps/pdf stacks, remove most of the Any annotations
commit be1550189e10717f6827dbb7009d6e8c8b3f4c62
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Aug 21 10:13:58 2021 -0700
silence missing imports, (maybe?) hook to tox
commit ff4b6a9bd46b352583d823d39065652c9a6f05f4
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Aug 20 22:49:06 2021 -0700
turn on more strict checks, and untangle the layout mess with generics
Status:
$ mypy pdfminer
pdfminer/ccitt.py:565: error: Cannot find implementation or library stub for module named "pygame"
pdfminer/ccitt.py:565: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
pdfminer/pdfdocument.py:7: error: Skipping analyzing "cryptography.hazmat.backends": found module but no type hints or library stubs
pdfminer/pdfdocument.py:8: error: Skipping analyzing "cryptography.hazmat.primitives.ciphers": found module but no type hints or library stubs
pdfminer/pdfdevice.py:191: error: Argument 1 to "write" of "IO" has incompatible type "str"; expected "bytes"
pdfminer/image.py:84: error: Cannot find implementation or library stub for module named "PIL"
Found 5 errors in 4 files (checked 27 source files)
pdfdevice.py:191 appears to be a real bug
commit 5c9c0b19d26ae391aea0e69c2c819261cc04460c
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Aug 20 17:22:41 2021 -0700
finish annotating layout
commit 0e6871c16abb29df2868ab145b4ce451b4b6c777
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Aug 20 16:54:46 2021 -0700
general progress on annotations
* finish utils
* annotate more of pdfinterp, pdfdevice
* document reason for # type: ignore comments
* fix cyclic imports
* satisfy flake8
commit 17d59f42917fbf9b2b2eb844d3e83a8f2a3f123a
Author: Andrew Baumann <ab@ab.id.au>
Date: Thu Aug 19 21:38:50 2021 -0700
WIP on type annotations
With the possible exception of psparser.py, this is far from complete.
$ mypy pdfminer
pdfminer/ccitt.py:565: error: Cannot find implementation or library stub for module named "pygame"
pdfminer/ccitt.py:565: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
pdfminer/pdfdocument.py:7: error: Skipping analyzing "cryptography.hazmat.backends": found module but no type hints or library stubs
pdfminer/pdfdocument.py:8: error: Skipping analyzing "cryptography.hazmat.primitives.ciphers": found module but no type hints or library stubs
pdfminer/image.py:84: error: Cannot find implementation or library stub for module named "PIL"
2021-10-09 14:23:28 +00:00
- Type annotations ([#661](https://github.com/pdfminer/pdfminer.six/pull/661))
2021-08-26 18:53:13 +00:00
2020-10-25 13:34:45 +00:00
### Fixed
2021-08-29 19:32:14 +00:00
- `KeyError` when `'Encrypt'` but not `'ID'` present in `trailer` ([#594](https://github.com/pdfminer/pdfminer.six/pull/594))
2021-08-26 18:55:02 +00:00
- Fix issue of ValueError and KeyError rasied in PDFdocument and PDFparser ([#573](https://github.com/pdfminer/pdfminer.six/pull/574))
2020-10-25 13:34:45 +00:00
- Fix issue of TypeError: cannot unpack non-iterable PDFObjRef object, when unpacking the value of 'DW2' ([#529](https://github.com/pdfminer/pdfminer.six/pull/529))
2021-08-30 19:31:32 +00:00
- Fix `PermissionError` when creating temporary filepaths on windows when running tests ([#484](https://github.com/pdfminer/pdfminer.six/pull/484))
- Fix `AttributeError` when dumping a TOC with bytes destinations ([#600](https://github.com/pdfminer/pdfminer.six/pull/600))
2021-08-26 19:05:03 +00:00
- Fix issue of some Chinese characters can not be extracted correctly ([#593](https://github.com/pdfminer/pdfminer.six/pull/593))
2021-08-15 15:49:56 +00:00
- Detecting trailer correctly when surrounded with needless whitespace ([#535](https://github.com/pdfminer/pdfminer.six/pull/535))
2021-07-27 16:27:32 +00:00
- Fix `.paint_path` logic for handling single line segments and extracting point-on-curve positions of Beziér path commands ([#530](https://github.com/pdfminer/pdfminer.six/pull/530))
2021-08-31 18:46:20 +00:00
- Raising `UnboundLocalError` when a bad `--output-type` is used ([#610](https://github.com/pdfminer/pdfminer.six/pull/610))
- `TypeError` when using `TagExtractor` with non-string or non-bytes tag values ([#610](https://github.com/pdfminer/pdfminer.six/pull/610))
Fix bug: _is_binary_stream should recognize TextIOWrapper as non-binary, escaped \r\n should be removed (#616)
* detect TextIOWrapper as non-binary
* I don't understand the CHANGELOG.md format, hope this is good enough
* Delete \\\r\n in Literal Strings (ref. section 7.3.4.2 of PDF32000_2008)
* Keep Travis CI happy
* Added test
* Remove pdfminer/Changelog
* Prettify _parse_string_1
* Add CHANGELOG.md
* Satisfy flake8
* Update CHANGELOG.md
* Use logging.Logger.warning instead of warning.warn in most cases, following
the Python official guidance that warning.warn is directed at _developers_,
not users
* (pdfdocument.py) remove declarations of PDFTextExtractionNotAllowedWarning,
PDFNoValidXRefWarning
* (pdfpage.py) Don't import warning, don't use PDFTextExtractionNotAllowedWarning
* (tools/dumppdf.py) Don't import warning, don't use PDFNoValidXRefWarning
* (tests/test_tools_dumppdf.py) Don't import warning, check for logging.WARN rather
than PDFNoValidXRefWarning
* get name right
* make flake8 happy
* Revert "make flake8 happy"
This reverts commit 45927696869abff5041cc5a338aa9390cd98606e.
* Revert "get name right"
This reverts commit 80091ea211c279511d206d14b2ad6cb0fb887a1f.
* Revert "Use logging.Logger.warning instead of warning.warn in most cases, following"
This reverts commit 3c1e3d66064e0c42d86a7191c357e16d1406449d.
* Revert "Merge branch 'preferLoggingToWarning' into hst"
This reverts commit 9d9d1399216d589ab600755d6548240d935c3ff3, reversing
changes made to 80091ea211c279511d206d14b2ad6cb0fb887a1f.
* Revert "Revert "Merge branch 'preferLoggingToWarning' into hst""
This reverts commit b3da21934d29c5cfa9354d7a41018368b6d51e9f.
Co-authored-by: Henry S. Thompson <ht@home.hst.name>
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2021-09-27 18:30:40 +00:00
- Using `io.TextIOBase` as the file to write to ([#616](https://github.com/pdfminer/pdfminer.six/pull/616))
- Parsing \r\n after the escape character in a literal string ([#616](https://github.com/pdfminer/pdfminer.six/pull/616))
2020-10-25 13:34:45 +00:00
2020-10-25 11:34:51 +00:00
## Removed
- Support for Python 3.4 and 3.5 ([#522](https://github.com/pdfminer/pdfminer.six/pull/522))
2020-10-24 13:55:22 +00:00
- Unused dependency on `sortedcontainers` package ([#525](https://github.com/pdfminer/pdfminer.six/pull/525))
2020-10-25 13:37:12 +00:00
- Support for non-standard output streams that are not binary ([#523](https://github.com/pdfminer/pdfminer.six/pull/523))
2021-10-12 18:22:58 +00:00
- Dependency on typing-extensions introduced by [#661 ](https://github.com/pdfminer/pdfminer.six/pull/661 ) ([#677](https://github.com/pdfminer/pdfminer.six/pull/677))
2020-10-24 13:55:22 +00:00
2020-10-18 10:57:26 +00:00
## [20201018]
2020-09-10 17:28:00 +00:00
2020-10-10 14:15:03 +00:00
### Deprecated
2020-10-25 11:34:51 +00:00
- Support for Python 3.4 and 3.5 ([#507](https://github.com/pdfminer/pdfminer.six/pull/507))
2020-10-10 14:15:03 +00:00
2020-09-17 19:29:00 +00:00
### Added
2020-10-10 13:17:04 +00:00
- Option to disable boxes flow layout analysis when using pdf2txt ([#479](https://github.com/pdfminer/pdfminer.six/pull/479))
2020-10-25 11:34:51 +00:00
- Support for `pathlib.PurePath` in `open_filename` ([#492](https://github.com/pdfminer/pdfminer.six/pull/492))
2020-09-17 19:29:00 +00:00
2020-09-10 19:09:07 +00:00
### Fixed
- Pass caching parameter to PDFResourceManager in `high_level` functions ([#475](https://github.com/pdfminer/pdfminer.six/pull/475))
2020-10-25 11:34:51 +00:00
- Fix `.paint_path` logic for handling non-rect quadrilaterals and decomposing complex paths ([#512](https://github.com/pdfminer/pdfminer.six/pull/512))
2020-10-10 13:18:34 +00:00
- Fix out-of-bound access on some PDFs ([#483](https://github.com/pdfminer/pdfminer.six/pull/483))
2020-09-10 17:28:00 +00:00
2020-09-10 19:09:07 +00:00
### Removed
2020-09-10 17:28:00 +00:00
- Remove unused rijndael encryption implementation ([#465](https://github.com/pdfminer/pdfminer.six/pull/465))
2020-07-26 13:06:04 +00:00
## [20200726]
### Fixed
- Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change ([#461](https://github.com/pdfminer/pdfminer.six/pull/461))
2020-07-26 13:14:15 +00:00
- Always try to get CMap, not only for identity encodings ([#438](https://github.com/pdfminer/pdfminer.six/pull/438))
2020-07-26 13:06:04 +00:00
2020-07-20 20:05:19 +00:00
## [20200720]
2020-05-23 16:04:34 +00:00
2020-07-11 15:34:38 +00:00
### Added
- Support for painting multiple rectangles at once ([#371](https://github.com/pdfminer/pdfminer.six/pull/371))
2020-07-26 13:06:04 +00:00
### Fixed
2020-07-05 11:42:15 +00:00
- Validate image object in do_EI is a PDFStream ([#451](https://github.com/pdfminer/pdfminer.six/pull/451))
### Changed
2020-05-23 16:04:34 +00:00
- Hiding fallback xref by default from dumppdf.py output ([#431](https://github.com/pdfminer/pdfminer.six/pull/431))
2020-10-25 11:34:51 +00:00
- Raise a warning instead of an error when extracting text from a non-extractable PDF ([#453](https://github.com/pdfminer/pdfminer.six/pull/453))
2020-07-20 20:00:54 +00:00
- Switched from pycryptodome to cryptography package for AES decryption ([#456](https://github.com/pdfminer/pdfminer.six/pull/456))
2020-07-11 14:04:11 +00:00
2020-05-17 15:49:51 +00:00
## [20200517]
2020-04-28 08:58:42 +00:00
### Added
2020-10-25 11:34:51 +00:00
- Python3 shebang line to script in tools ([#408](https://github.com/pdfminer/pdfminer.six/pull/408))
2020-05-17 15:49:51 +00:00
2020-05-09 13:37:49 +00:00
### Fixed
2020-10-25 11:34:51 +00:00
- Fix ordering of textlines within a textbox when `boxes_flow=None` ([#412](https://github.com/pdfminer/pdfminer.six/pull/412))
2020-04-28 08:58:42 +00:00
2020-04-01 19:37:39 +00:00
## [20200402]
2019-10-20 10:32:11 +00:00
2020-03-26 22:03:49 +00:00
### Added
2020-10-25 11:34:51 +00:00
- Allow boxes_flow LAParam to be passed as None, validate the input, and update documentation ([#396](https://github.com/pdfminer/pdfminer.six/pull/396))
- Also accept file-like objects in high level functions `extract_text` and `extract_pages` ([#393](https://github.com/pdfminer/pdfminer.six/pull/393))
2020-03-26 22:03:49 +00:00
2020-03-14 09:33:39 +00:00
### Fixed
2020-10-25 11:34:51 +00:00
- Text no longer comes in reverse order when advanced layout analysis is disabled ([#399](https://github.com/pdfminer/pdfminer.six/pull/399))
2020-03-26 22:02:48 +00:00
- Updated misleading documentation for `word_margin` and `char_margin` ([#407](https://github.com/pdfminer/pdfminer.six/pull/407))
2020-03-16 19:12:45 +00:00
- Ignore ValueError when converting font encoding differences ([#389](https://github.com/pdfminer/pdfminer.six/pull/389))
2020-03-14 09:33:39 +00:00
- Grouping of text lines outside of parent container bounding box ([#386](https://github.com/pdfminer/pdfminer.six/pull/386))
2020-01-24 11:38:11 +00:00
2020-03-23 21:38:39 +00:00
### Changed
2020-10-25 11:34:51 +00:00
- Group text lines if they are centered ([#384](https://github.com/pdfminer/pdfminer.six/pull/384))
2020-03-23 21:38:39 +00:00
2020-01-24 11:38:11 +00:00
## [20200124] - 2020-01-24
2020-01-24 11:36:02 +00:00
### Security
- Removed samples/issue-00152-embedded-pdf.pdf because it contains a possible security thread; a javascript enabled object ([#364](https://github.com/pdfminer/pdfminer.six/pull/364))
2020-01-21 20:13:52 +00:00
## [20200121] - 2020-01-21
2020-01-07 20:59:13 +00:00
### Fixed
- Interpret two's complement integer as unsigned integer ([#352](https://github.com/pdfminer/pdfminer.six/pull/352))
2020-01-16 21:25:20 +00:00
- Fix font name in html output such that it is recognized by browser ([#357](https://github.com/pdfminer/pdfminer.six/pull/357))
2020-01-16 21:15:50 +00:00
- Compute correct font height by removing scaling with font bounding box height ([#348](https://github.com/pdfminer/pdfminer.six/pull/348))
2020-01-16 21:11:42 +00:00
- KeyError when extracting embedded files and a Unicode file specification is missing ([#338](https://github.com/pdfminer/pdfminer.six/pull/338))
2020-01-04 17:15:15 +00:00
2020-01-16 21:26:01 +00:00
### Removed
- The command-line utility latin2ascii.py ([#360](https://github.com/pdfminer/pdfminer.six/pull/360))
2020-01-04 17:15:15 +00:00
## [20200104] - 2019-01-04
2020-01-04 15:47:07 +00:00
## Removed
- Support for Python 2 ([#346](https://github.com/pdfminer/pdfminer.six/pull/346))
2019-12-29 20:20:20 +00:00
### Changed
- Enforce pep8 coding style by adding flake8 to CI ([#345](https://github.com/pdfminer/pdfminer.six/pull/345))
2019-11-10 11:29:14 +00:00
## [20191110] - 2019-11-10
2019-11-10 11:18:49 +00:00
### Fixed
- Wrong order of text box grouping introduced by PR #315 ([#335](https://github.com/pdfminer/pdfminer.six/pull/335))
2019-11-07 20:52:44 +00:00
## [20191107] - 2019-11-07
2019-11-02 09:29:39 +00:00
### Deprecated
- The argument `_py2_no_more_posargs` because Python2 is removed on January
, 2020 ([#328](https://github.com/pdfminer/pdfminer.six/pull/328) and
[#307 ](https://github.com/pdfminer/pdfminer.six/pull/307 ))
2019-10-22 15:37:06 +00:00
### Added
2019-11-07 06:54:10 +00:00
- Simple wrapper to easily extract text from a PDF file [#330 ](https://github.com/pdfminer/pdfminer.six/pull/330 )
2019-10-22 15:37:06 +00:00
- Support for extracting JBIG2 encoded images ([#311](https://github.com/pdfminer/pdfminer.six/pull/311) and [#46 ](https://github.com/pdfminer/pdfminer.six/pull/46 ))
2019-11-07 20:12:34 +00:00
- Sphinx documentation that is published on
[Read the Docs ](https://pdfminersix.readthedocs.io/ )
([#329](https://github.com/pdfminer/pdfminer.six/pull/329))
2019-10-20 12:21:48 +00:00
2019-10-25 20:49:58 +00:00
### Fixed
2019-11-06 20:51:34 +00:00
- Unhandled AssertionError when dumping pdf containing reference to object id 0
([#318](https://github.com/pdfminer/pdfminer.six/pull/318))
- Debug flag actually changes logging level to debug for pdf2txt.py and
dumppdf.py ([#325](https://github.com/pdfminer/pdfminer.six/pull/325))
2019-10-25 20:49:58 +00:00
2019-10-27 20:40:04 +00:00
### Changed
- Using argparse instead of getopt for command line interface of dumppdf.py ([#321](https://github.com/pdfminer/pdfminer.six/pull/321))
2019-10-31 08:22:58 +00:00
- Refactor `LTLayoutContainer.group_textboxes` for a significant speed up in layout analysis ([#315](https://github.com/pdfminer/pdfminer.six/pull/315))
2019-10-27 20:40:04 +00:00
2019-10-26 17:16:37 +00:00
### Removed
2020-10-25 11:34:51 +00:00
- Files for external applications such as django, cgi and pyinstaller ([#320](https://github.com/pdfminer/pdfminer.six/pull/320))
2019-10-26 17:16:37 +00:00
2019-10-20 12:21:48 +00:00
## [20191020] - 2019-10-20
2019-10-20 11:59:29 +00:00
### Deprecated
- Support for Python 2 is dropped at January 1st, 2020 ([#307](https://github.com/pdfminer/pdfminer.six/pull/307))
2019-10-20 10:32:11 +00:00
### Added
- Contribution guidelines in [CONTRIBUTING.md ](CONTRIBUTING.md ) ([#259](https://github.com/pdfminer/pdfminer.six/pull/259))
- Support new encodings OneByteEncoding and DLIdent for CMaps ([#283](https://github.com/pdfminer/pdfminer.six/pull/283))
### Fixed
- Use `six.iteritems()` instead of `dict().iteritems()` to ensure Python2 and Python3 compatibility ([#274](https://github.com/pdfminer/pdfminer.six/pull/274))
- Properly convert Adobe Glyph names to unicode characters ([#263](https://github.com/pdfminer/pdfminer.six/pull/263))
- Allow CMap to be a content stream ([#283](https://github.com/pdfminer/pdfminer.six/pull/283))
- Resolve indirect objects for width and bounding boxes for fonts ([#273](https://github.com/pdfminer/pdfminer.six/pull/273))
- Actually updating stroke color in graphic state ([#298](https://github.com/pdfminer/pdfminer.six/pull/298))
- Interpret (invalid) negative font descent as a positive descent ([#203](https://github.com/pdfminer/pdfminer.six/pull/203))
- Correct colorspace comparision for images ([#132](https://github.com/pdfminer/pdfminer.six/pull/132))
- Allow for bounding boxes with zero height or width by removing assertion ([#246](https://github.com/pdfminer/pdfminer.six/pull/246))
### Changed
2019-10-22 15:37:06 +00:00
- All dependencies are managed in `setup.py` ([#306](https://github.com/pdfminer/pdfminer.six/pull/306) and [#219 ](https://github.com/pdfminer/pdfminer.six/pull/219 ))
2019-10-20 10:32:11 +00:00
## [20181108] - 2018-11-08
### Changed
- Speedup layout analysis ([#141](https://github.com/pdfminer/pdfminer.six/pull/141))
- Use argparse instead of replace deprecated getopt ([#173](https://github.com/pdfminer/pdfminer.six/pull/173))
2020-04-01 19:37:39 +00:00
- Allow pdfminer.six to be compiled with cython ([#142](https://github.com/pdfminer/pdfminer.six/pull/142))