2019-10-20 10:32:11 +00:00
# Changelog
2022-03-21 18:20:42 +00:00
All notable changes in pdfminer.six will be documented in this file.
2018-11-08 16:38:17 +00:00
2019-10-20 10:32:11 +00:00
The format is based on [Keep a Changelog ](https://keepachangelog.com/en/1.0.0/ ).
2022-05-24 18:07:04 +00:00
## [Unreleased]
2022-08-14 10:12:02 +00:00
### Added
2022-08-14 09:52:50 +00:00
- Output converter for the hOCR format ([#651](https://github.com/pdfminer/pdfminer.six/pull/651))
2022-08-14 10:12:02 +00:00
- Font name aliases for Arial, Courier New and Times New Roman ([#790](https://github.com/pdfminer/pdfminer.six/pull/790))
2022-08-14 09:52:50 +00:00
2022-05-24 18:07:04 +00:00
### Fixed
2022-08-14 10:12:02 +00:00
- `ValueError` when bmp images with 1 bit channel are decoded ([#773](https://github.com/pdfminer/pdfminer.six/issues/773))
2022-06-26 15:25:30 +00:00
- `ValueError` when trying to decrypt empty metadata values ([#766](https://github.com/pdfminer/pdfminer.six/issues/766))
2022-05-24 18:07:04 +00:00
- Sphinx errors during building of documentation ([#760](https://github.com/pdfminer/pdfminer.six/pull/760))
2022-06-25 21:16:28 +00:00
- `TypeError` when getting default width of font ([#720](https://github.com/pdfminer/pdfminer.six/issues/720))
2022-08-14 10:12:02 +00:00
- Installing typing-extensions on Python 3.6 and 3.7 ([#775](https://github.com/pdfminer/pdfminer.six/pull/775))
2022-06-26 15:46:39 +00:00
- `TypeError` in cmapdb.py when parsing null characters ([#768](https://github.com/pdfminer/pdfminer.six/pull/768))
Fix to set color space from color convenience ops (#794)
Section 4.5 of the PDF reference says: "Color values are interpreted
according to the current color space, another parameter of the graphics
state. A PDF content stream first selects a color space by invoking the
CS operator (for the stroking color) or the cs operator (for the
non-stroking color). It then selects color values within that color
space with the SC operator (stroking) or the sc operator (nonstroking).
There are also convenience operators—G, g, RG, rg, K, and k—that select
both a color space and a color value within it in a single step."
Previously, those convenience operators did *not* set the color space.
This commit, following on filed issue #779, fixes this. It also adds a
test to demonstrate that, at least for the do_rg method, the fix works
as intended.
2022-08-18 18:38:51 +00:00
- Color "convenience operators" now (per spec) also set color space ([#779](https://github.com/pdfminer/pdfminer.six/issues/779))
2022-05-24 18:07:04 +00:00
2022-06-25 21:11:10 +00:00
### Deprecated
- Usage of `if __name__ == "__main__"` where it was only intended for testing purposes ([#756](https://github.com/pdfminer/pdfminer.six/pull/756))
2022-05-24 18:07:04 +00:00
## [20220524]
2022-03-21 18:20:42 +00:00
2022-03-21 18:25:28 +00:00
### Fixed
2022-03-21 18:20:42 +00:00
2022-05-06 20:15:00 +00:00
- Ignoring (invalid) path constructors that do not begin with `m` ([#749](https://github.com/pdfminer/pdfminer.six/pull/749))
2022-05-24 17:41:54 +00:00
### Changed
- Removed upper version bounds ([#755](https://github.com/pdfminer/pdfminer.six/pull/755))
2022-05-06 20:15:00 +00:00
## [20220506]
### Fixed
2022-03-21 18:39:53 +00:00
- `IndexError` when handling invalid bfrange code map in
CMap ([#731](https://github.com/pdfminer/pdfminer.six/pull/731))
2022-03-21 18:27:22 +00:00
- `TypeError` in lzw.py when `self.table` is not set ([#732](https://github.com/pdfminer/pdfminer.six/pull/732))
2022-03-21 18:25:28 +00:00
- `TypeError` in encodingdb.py when name of unicode is not
str ([#733](https://github.com/pdfminer/pdfminer.six/pull/733))
2022-03-21 18:20:42 +00:00
- `TypeError` in HTMLConverter when using a bytes fontname ([#734](https://github.com/pdfminer/pdfminer.six/pull/734))
2022-03-22 19:58:16 +00:00
### Added
- Exporting images without any specific encoding ([#737](https://github.com/pdfminer/pdfminer.six/pull/737))
2022-04-20 19:42:50 +00:00
### Changed
- Using charset-normalizer instead of chardet for less restrictive license ([#744](https://github.com/pdfminer/pdfminer.six/pull/744))
2022-03-19 19:49:22 +00:00
## [20220319]
2021-10-13 19:52:00 +00:00
### Added
2022-03-21 18:20:42 +00:00
2022-01-25 21:11:17 +00:00
- Export type annotations from pypi package per PEP561 ([#679](https://github.com/pdfminer/pdfminer.six/pull/679))
2021-10-13 19:52:00 +00:00
- Support for identity cmap's ([#626](https://github.com/pdfminer/pdfminer.six/pull/626))
2022-02-01 09:08:05 +00:00
- Add support for PDF page labels ([#680](https://github.com/pdfminer/pdfminer.six/pull/680))
2022-02-22 19:20:17 +00:00
- Installation of Pillow as an optional extra dependency ([#714](https://github.com/pdfminer/pdfminer.six/pull/714))
2021-10-13 19:52:00 +00:00
2021-12-11 17:25:19 +00:00
### Fixed
2022-03-21 18:20:42 +00:00
2021-12-11 17:25:19 +00:00
- Hande decompression error due to CRC checksum error ([#637](https://github.com/pdfminer/pdfminer.six/pull/637))
2022-01-26 18:55:08 +00:00
- Regression (since 20191107) in `LTLayoutContainer.group_textboxes` that returned some text lines out of order ([#659](https://github.com/pdfminer/pdfminer.six/pull/659))
2022-01-23 20:17:47 +00:00
- Add handling of JPXDecode filter to enable extraction of images for some pdfs ([#645](https://github.com/pdfminer/pdfminer.six/pull/645))
2022-01-23 20:41:08 +00:00
- Fix extraction of jbig2 files, which was producing invalid files ([#652](https://github.com/pdfminer/pdfminer.six/pull/653))
pdf2txt: clean up construction of LAParams from arguments (#682)
* Fix pdf2txt --boxes-flow=disabled
Fixes:
```
$ pdf2txt.py --boxes-flow=disabled test.pdf
Traceback (most recent call last):
File "tools/pdf2txt.py", line 204, in <module>
sys.exit(main())
File "tools/pdf2txt.py", line 198, in main
outfp = extract_text(**vars(A))
File "tools/pdf2txt.py", line 66, in extract_text
pdfminer.high_level.extract_text_to_fp(fp, **locals())
File "pdfminer/high_level.py", line 85, in extract_text_to_fp
interpreter.process_page(page)
File "pdfminer/pdfinterp.py", line 896, in process_page
self.device.end_page(page)
File "pdfminer/converter.py", line 51, in end_page
self.cur_item.analyze(self.laparams)
File "pdfminer/layout.py", line 822, in analyze
group.analyze(laparams)
File "pdfminer/layout.py", line 575, in analyze
LTTextGroup.analyze(self, laparams)
File "pdfminer/layout.py", line 362, in analyze
obj.analyze(laparams)
File "pdfminer/layout.py", line 575, in analyze
LTTextGroup.analyze(self, laparams)
File "pdfminer/layout.py", line 362, in analyze
obj.analyze(laparams)
File "pdfminer/layout.py", line 575, in analyze
LTTextGroup.analyze(self, laparams)
File "pdfminer/layout.py", line 362, in analyze
obj.analyze(laparams)
File "pdfminer/layout.py", line 577, in analyze
self._objs.sort(
File "pdfminer/layout.py", line 578, in <lambda>
key=lambda obj: (1 - laparams.boxes_flow) * obj.x0
TypeError: unsupported operand type(s) for -: 'int' and 'str'
```
Related: Issue #477, PR #479
* update CHANGELOG
* merge CHANGELOG
* pdf2txt: clean up handling of layout parameter arguments
* avoid specifying default values twice
* construct LAParams earlier, rather than passing its components around
* fix crash with --boxes_flow=disabled
* update CHANGELOG
* construct new LAParams, so _validate runs
* Improve readability of setting LAParams by explicitly copying them from parsed_args into init of LAParams. And move all parsed_args post processing to the parse_args() method.
* Add cli argument for line_overlap
* Also use default values from LAParams for --detect-vertical and --all-texts
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2022-01-25 21:06:06 +00:00
- Crash in `pdf2txt.py --boxes-flow=disabled` ([#682](https://github.com/pdfminer/pdfminer.six/pull/682))
2022-02-01 00:20:52 +00:00
- Only use xref fallback if `PDFNoValidXRef` is raised and `fallback` is True ([#684](https://github.com/pdfminer/pdfminer.six/pull/684))
2022-02-22 20:20:26 +00:00
- Ignore empty characters when analyzing layout ([#499](https://github.com/pdfminer/pdfminer.six/pull/499))
2022-02-01 00:20:52 +00:00
### Changed
- Replace warnings.warn with logging.Logger.warning in line with [recommended use ](https://docs.python.org/3/howto/logging.html#when-to-use-logging ) ([#673](https://github.com/pdfminer/pdfminer.six/pull/673))
2022-02-08 20:24:00 +00:00
- Switched from nose to pytest, from tox to nox and from Travis CI to GitHub Actions ([#704](https://github.com/pdfminer/pdfminer.six/pull/704))
2021-12-11 17:25:19 +00:00
2022-02-01 00:49:46 +00:00
### Removed
- Unnecessary return statements without argument at the end of functions ([#707](https://github.com/pdfminer/pdfminer.six/pull/707))
2021-10-12 18:45:24 +00:00
## [20211012]
2020-10-24 13:55:22 +00:00
2021-08-26 18:53:13 +00:00
### Added
2021-09-06 20:00:23 +00:00
- Add support for PDF 2.0 (ISO 32000-2) AES-256 encryption ([#614](https://github.com/pdfminer/pdfminer.six/pull/614))
2021-08-26 18:53:13 +00:00
- Support for Paeth PNG filter compression (predictor value = 4) ([#537](https://github.com/pdfminer/pdfminer.six/pull/537))
Add type annotations (#661)
Squashed commit of the following:
commit fa229f7b7591c07aea4e5a4545f9e0c34246e1cd
Merge: eaab3c6 c3e3499
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 20:33:06 2021 -0700
Merge branch 'develop' into mypy (and fixed types)
commit eaab3c65e2e3ab5f1f400cfc5186a3834c4ffe34
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 20:00:45 2021 -0700
reformat all multi-line function defs to one-arg-per-line
commit 3fe2b69eed9197009d9da6776462f580ebf0dfa3
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:58:48 2021 -0700
ccitt nit -- avoid casting needlessly
commit 15983d8c1e7162632fde43752c9d1c15938cd980
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:58:36 2021 -0700
tweak CHANGELOG
commit 13dc0babf782938e7d5b5e482d4c5adf92d82702
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:43:46 2021 -0700
add failing tests for dumppdf crash
commit 6b509c517876b8c15ac5a98a963884e23bd2e4d8
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:24:23 2021 -0700
ccitt: apply misc PR feedback
commit feb031ba86d3f22e41cfbbda13f17c039359f1e6
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:18:26 2021 -0700
add missing None return type to all __init__ methods
commit c0d62d6c54c7ec37b40bea54a3f6a7a618ec0ec6
Author: Andrew Baumann <ab@ab.id.au>
Date: Mon Sep 6 15:13:08 2021 -0700
minor cleanup, remove a few more Any types
commit b52a0594e1998a492c172538a9b35491c5fc5f52
Author: Andrew Baumann <ab@ab.id.au>
Date: Sun Sep 5 22:37:28 2021 -0700
tighten up types, avoid Any in favour of explicit casts
commit e58fd48bd14f31bebd2de8259f12630ac02756d6
Author: Andrew Baumann <ab@ab.id.au>
Date: Sun Sep 5 14:10:49 2021 -0700
annotate ccitt.py, and fix one definite bug (array.tostring was renamed tobytes)
commit 605290633e55595e5e0045840df5c5b1d9de843a
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Sep 4 22:37:38 2021 -0700
python 3.7 back-compat
commit 4dbcf8760f8a1d3e3d99f085476f86e6a043c80c
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Sep 4 22:32:43 2021 -0700
annotate pdfminer.jbig2
commit 0d40b7c03a8028dc44acd3f457eac71abd681827
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Sep 4 22:31:33 2021 -0700
annotate pdf2txt.py
commit 5f82eb4f5646b5d1285252689191e0a14557ec7b
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Sep 4 09:16:31 2021 -0700
cleanup: make Plane generic
commit 624fc92b88473ff36a174760883f34c22109da2b
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 23:16:51 2021 -0700
bluntly ignore calls to cryptography.hazmat
commit 96b20439c169f40dbb114cabba6a582ad1ebe91e
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 23:01:06 2021 -0700
finish annotating, and disallow_untyped_defs for pdfminer.* _except_ ccitt and jbig2
commit 0ab586347861b72b1d16880dc9293f9ad597e20a
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 21:51:56 2021 -0700
annotate pdffont
commit 4b689f1bcbdaf654feb9de81023e318ca310a12e
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 18:30:02 2021 -0700
annotate a couple more scripts; document sketchy code
commit 291981ff3d273952ec9c92ef8ab948473558b787
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 15:02:01 2021 -0700
pacify flake8
commit 45d2ce91ff333f3b7e34322b16e9c52b99b7a972
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 14:31:48 2021 -0700
annotate dumppdf, and comment likely bugs
commit 7278d83851cb336a1be3803a0993b5ec0ad39b4c
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 13:49:58 2021 -0700
enable mypy on tests and tools, fix one implicit reexport bug
commit 4a83166ef4e4733cd2113f43188b585a4fda392b
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 13:25:59 2021 -0700
pdfdocument: per dumppdf.py, get_dest accepts either bytes or str
commit 43701e1bee068df98f378a253c9c2150ee4ad9f7
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 13:25:00 2021 -0700
layout: LAParams.boxes_flow may be None
commit 164f81652f1788e74837466f0ab593e94079bc0f
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 09:45:09 2021 -0700
add whitespace, pacify flake8
commit 893b9fb9ec918032b36a30456fc0b7a217da86d8
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 09:40:33 2021 -0700
support old Python without typing.Protocol
commit dc245084102b7b04c3f5599d75b5d62ba4290787
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Sep 3 09:12:03 2021 -0700
Move "# type: ignore" comments to fix mypy on Python < 3.8
The placement of these comments got more flexible in 3.8 due to
https://github.com/python/mypy/issues/1032
Satisfying older Python and fitting in flake8's 79-character line
limit was quite a challenge!
commit da03afe7bd2cf3336e611f467f1c901455940ae8
Author: Andrew Baumann <ab@ab.id.au>
Date: Thu Sep 2 22:59:58 2021 -0700
fix text output from HTMLConverter
commit 5401276a2ed3b74a385ebcab5152485224146161
Author: Andrew Baumann <ab@ab.id.au>
Date: Thu Sep 2 22:40:22 2021 -0700
annotate high_level.py and the immediately-reachable internal APIs (mostly converters)
commit cc490513f8f17a7adc0bcbab2e0e86f37e832300
Author: Andrew Baumann <ab@ab.id.au>
Date: Thu Sep 2 17:04:35 2021 -0700
* expand and improve annotations in cmap, encryption/decompression and fonts
* disallow untyped calls; this way, we have a core set of
typed code that can grow over time
(just not for ccitt, because there's a ton of work lurking there)
* expand "typing: none" comments to suppress a specific error code
commit 92df54ba1d53d5dbbd5442757dd85be5b1851f99
Author: Andrew Baumann <ab@ab.id.au>
Date: Wed Sep 1 20:50:59 2021 -0700
update CHANGELOG
commit f72aaead45d0615e472a9b3190c9551a6b67b36e
Merge: ff787a9 8ea9f10
Author: Andrew Baumann <ab@ab.id.au>
Date: Wed Sep 1 20:47:03 2021 -0700
Merge branch 'develop' into mypy
commit ff787a93986c60361536a97182a41774f4a53ac3
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Aug 21 21:46:14 2021 -0700
be more precise about types on ps/pdf stacks, remove most of the Any annotations
commit be1550189e10717f6827dbb7009d6e8c8b3f4c62
Author: Andrew Baumann <ab@ab.id.au>
Date: Sat Aug 21 10:13:58 2021 -0700
silence missing imports, (maybe?) hook to tox
commit ff4b6a9bd46b352583d823d39065652c9a6f05f4
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Aug 20 22:49:06 2021 -0700
turn on more strict checks, and untangle the layout mess with generics
Status:
$ mypy pdfminer
pdfminer/ccitt.py:565: error: Cannot find implementation or library stub for module named "pygame"
pdfminer/ccitt.py:565: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
pdfminer/pdfdocument.py:7: error: Skipping analyzing "cryptography.hazmat.backends": found module but no type hints or library stubs
pdfminer/pdfdocument.py:8: error: Skipping analyzing "cryptography.hazmat.primitives.ciphers": found module but no type hints or library stubs
pdfminer/pdfdevice.py:191: error: Argument 1 to "write" of "IO" has incompatible type "str"; expected "bytes"
pdfminer/image.py:84: error: Cannot find implementation or library stub for module named "PIL"
Found 5 errors in 4 files (checked 27 source files)
pdfdevice.py:191 appears to be a real bug
commit 5c9c0b19d26ae391aea0e69c2c819261cc04460c
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Aug 20 17:22:41 2021 -0700
finish annotating layout
commit 0e6871c16abb29df2868ab145b4ce451b4b6c777
Author: Andrew Baumann <ab@ab.id.au>
Date: Fri Aug 20 16:54:46 2021 -0700
general progress on annotations
* finish utils
* annotate more of pdfinterp, pdfdevice
* document reason for # type: ignore comments
* fix cyclic imports
* satisfy flake8
commit 17d59f42917fbf9b2b2eb844d3e83a8f2a3f123a
Author: Andrew Baumann <ab@ab.id.au>
Date: Thu Aug 19 21:38:50 2021 -0700
WIP on type annotations
With the possible exception of psparser.py, this is far from complete.
$ mypy pdfminer
pdfminer/ccitt.py:565: error: Cannot find implementation or library stub for module named "pygame"
pdfminer/ccitt.py:565: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
pdfminer/pdfdocument.py:7: error: Skipping analyzing "cryptography.hazmat.backends": found module but no type hints or library stubs
pdfminer/pdfdocument.py:8: error: Skipping analyzing "cryptography.hazmat.primitives.ciphers": found module but no type hints or library stubs
pdfminer/image.py:84: error: Cannot find implementation or library stub for module named "PIL"
2021-10-09 14:23:28 +00:00
- Type annotations ([#661](https://github.com/pdfminer/pdfminer.six/pull/661))
2021-08-26 18:53:13 +00:00
2020-10-25 13:34:45 +00:00
### Fixed
2021-08-29 19:32:14 +00:00
- `KeyError` when `'Encrypt'` but not `'ID'` present in `trailer` ([#594](https://github.com/pdfminer/pdfminer.six/pull/594))
2021-08-26 18:55:02 +00:00
- Fix issue of ValueError and KeyError rasied in PDFdocument and PDFparser ([#573](https://github.com/pdfminer/pdfminer.six/pull/574))
2020-10-25 13:34:45 +00:00
- Fix issue of TypeError: cannot unpack non-iterable PDFObjRef object, when unpacking the value of 'DW2' ([#529](https://github.com/pdfminer/pdfminer.six/pull/529))
2021-08-30 19:31:32 +00:00
- Fix `PermissionError` when creating temporary filepaths on windows when running tests ([#484](https://github.com/pdfminer/pdfminer.six/pull/484))
- Fix `AttributeError` when dumping a TOC with bytes destinations ([#600](https://github.com/pdfminer/pdfminer.six/pull/600))
2021-08-26 19:05:03 +00:00
- Fix issue of some Chinese characters can not be extracted correctly ([#593](https://github.com/pdfminer/pdfminer.six/pull/593))
2021-08-15 15:49:56 +00:00
- Detecting trailer correctly when surrounded with needless whitespace ([#535](https://github.com/pdfminer/pdfminer.six/pull/535))
2021-07-27 16:27:32 +00:00
- Fix `.paint_path` logic for handling single line segments and extracting point-on-curve positions of Beziér path commands ([#530](https://github.com/pdfminer/pdfminer.six/pull/530))
2021-08-31 18:46:20 +00:00
- Raising `UnboundLocalError` when a bad `--output-type` is used ([#610](https://github.com/pdfminer/pdfminer.six/pull/610))
- `TypeError` when using `TagExtractor` with non-string or non-bytes tag values ([#610](https://github.com/pdfminer/pdfminer.six/pull/610))
Fix bug: _is_binary_stream should recognize TextIOWrapper as non-binary, escaped \r\n should be removed (#616)
* detect TextIOWrapper as non-binary
* I don't understand the CHANGELOG.md format, hope this is good enough
* Delete \\\r\n in Literal Strings (ref. section 7.3.4.2 of PDF32000_2008)
* Keep Travis CI happy
* Added test
* Remove pdfminer/Changelog
* Prettify _parse_string_1
* Add CHANGELOG.md
* Satisfy flake8
* Update CHANGELOG.md
* Use logging.Logger.warning instead of warning.warn in most cases, following
the Python official guidance that warning.warn is directed at _developers_,
not users
* (pdfdocument.py) remove declarations of PDFTextExtractionNotAllowedWarning,
PDFNoValidXRefWarning
* (pdfpage.py) Don't import warning, don't use PDFTextExtractionNotAllowedWarning
* (tools/dumppdf.py) Don't import warning, don't use PDFNoValidXRefWarning
* (tests/test_tools_dumppdf.py) Don't import warning, check for logging.WARN rather
than PDFNoValidXRefWarning
* get name right
* make flake8 happy
* Revert "make flake8 happy"
This reverts commit 45927696869abff5041cc5a338aa9390cd98606e.
* Revert "get name right"
This reverts commit 80091ea211c279511d206d14b2ad6cb0fb887a1f.
* Revert "Use logging.Logger.warning instead of warning.warn in most cases, following"
This reverts commit 3c1e3d66064e0c42d86a7191c357e16d1406449d.
* Revert "Merge branch 'preferLoggingToWarning' into hst"
This reverts commit 9d9d1399216d589ab600755d6548240d935c3ff3, reversing
changes made to 80091ea211c279511d206d14b2ad6cb0fb887a1f.
* Revert "Revert "Merge branch 'preferLoggingToWarning' into hst""
This reverts commit b3da21934d29c5cfa9354d7a41018368b6d51e9f.
Co-authored-by: Henry S. Thompson <ht@home.hst.name>
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2021-09-27 18:30:40 +00:00
- Using `io.TextIOBase` as the file to write to ([#616](https://github.com/pdfminer/pdfminer.six/pull/616))
- Parsing \r\n after the escape character in a literal string ([#616](https://github.com/pdfminer/pdfminer.six/pull/616))
2020-10-25 13:34:45 +00:00
2022-06-25 21:11:10 +00:00
### Removed
2020-10-25 11:34:51 +00:00
- Support for Python 3.4 and 3.5 ([#522](https://github.com/pdfminer/pdfminer.six/pull/522))
2020-10-24 13:55:22 +00:00
- Unused dependency on `sortedcontainers` package ([#525](https://github.com/pdfminer/pdfminer.six/pull/525))
2020-10-25 13:37:12 +00:00
- Support for non-standard output streams that are not binary ([#523](https://github.com/pdfminer/pdfminer.six/pull/523))
2021-10-12 18:22:58 +00:00
- Dependency on typing-extensions introduced by [#661 ](https://github.com/pdfminer/pdfminer.six/pull/661 ) ([#677](https://github.com/pdfminer/pdfminer.six/pull/677))
2020-10-24 13:55:22 +00:00
2020-10-18 10:57:26 +00:00
## [20201018]
2020-09-10 17:28:00 +00:00
2020-10-10 14:15:03 +00:00
### Deprecated
2020-10-25 11:34:51 +00:00
- Support for Python 3.4 and 3.5 ([#507](https://github.com/pdfminer/pdfminer.six/pull/507))
2020-10-10 14:15:03 +00:00
2020-09-17 19:29:00 +00:00
### Added
2020-10-10 13:17:04 +00:00
- Option to disable boxes flow layout analysis when using pdf2txt ([#479](https://github.com/pdfminer/pdfminer.six/pull/479))
2020-10-25 11:34:51 +00:00
- Support for `pathlib.PurePath` in `open_filename` ([#492](https://github.com/pdfminer/pdfminer.six/pull/492))
2020-09-17 19:29:00 +00:00
2020-09-10 19:09:07 +00:00
### Fixed
- Pass caching parameter to PDFResourceManager in `high_level` functions ([#475](https://github.com/pdfminer/pdfminer.six/pull/475))
2020-10-25 11:34:51 +00:00
- Fix `.paint_path` logic for handling non-rect quadrilaterals and decomposing complex paths ([#512](https://github.com/pdfminer/pdfminer.six/pull/512))
2020-10-10 13:18:34 +00:00
- Fix out-of-bound access on some PDFs ([#483](https://github.com/pdfminer/pdfminer.six/pull/483))
2020-09-10 17:28:00 +00:00
2020-09-10 19:09:07 +00:00
### Removed
2020-09-10 17:28:00 +00:00
- Remove unused rijndael encryption implementation ([#465](https://github.com/pdfminer/pdfminer.six/pull/465))
2020-07-26 13:06:04 +00:00
## [20200726]
### Fixed
- Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change ([#461](https://github.com/pdfminer/pdfminer.six/pull/461))
2020-07-26 13:14:15 +00:00
- Always try to get CMap, not only for identity encodings ([#438](https://github.com/pdfminer/pdfminer.six/pull/438))
2020-07-26 13:06:04 +00:00
2020-07-20 20:05:19 +00:00
## [20200720]
2020-05-23 16:04:34 +00:00
2020-07-11 15:34:38 +00:00
### Added
- Support for painting multiple rectangles at once ([#371](https://github.com/pdfminer/pdfminer.six/pull/371))
2020-07-26 13:06:04 +00:00
### Fixed
2020-07-05 11:42:15 +00:00
- Validate image object in do_EI is a PDFStream ([#451](https://github.com/pdfminer/pdfminer.six/pull/451))
### Changed
2020-05-23 16:04:34 +00:00
- Hiding fallback xref by default from dumppdf.py output ([#431](https://github.com/pdfminer/pdfminer.six/pull/431))
2020-10-25 11:34:51 +00:00
- Raise a warning instead of an error when extracting text from a non-extractable PDF ([#453](https://github.com/pdfminer/pdfminer.six/pull/453))
2020-07-20 20:00:54 +00:00
- Switched from pycryptodome to cryptography package for AES decryption ([#456](https://github.com/pdfminer/pdfminer.six/pull/456))
2020-07-11 14:04:11 +00:00
2020-05-17 15:49:51 +00:00
## [20200517]
2020-04-28 08:58:42 +00:00
### Added
2020-10-25 11:34:51 +00:00
- Python3 shebang line to script in tools ([#408](https://github.com/pdfminer/pdfminer.six/pull/408))
2020-05-17 15:49:51 +00:00
2020-05-09 13:37:49 +00:00
### Fixed
2020-10-25 11:34:51 +00:00
- Fix ordering of textlines within a textbox when `boxes_flow=None` ([#412](https://github.com/pdfminer/pdfminer.six/pull/412))
2020-04-28 08:58:42 +00:00
2020-04-01 19:37:39 +00:00
## [20200402]
2019-10-20 10:32:11 +00:00
2020-03-26 22:03:49 +00:00
### Added
2020-10-25 11:34:51 +00:00
- Allow boxes_flow LAParam to be passed as None, validate the input, and update documentation ([#396](https://github.com/pdfminer/pdfminer.six/pull/396))
- Also accept file-like objects in high level functions `extract_text` and `extract_pages` ([#393](https://github.com/pdfminer/pdfminer.six/pull/393))
2020-03-26 22:03:49 +00:00
2020-03-14 09:33:39 +00:00
### Fixed
2020-10-25 11:34:51 +00:00
- Text no longer comes in reverse order when advanced layout analysis is disabled ([#399](https://github.com/pdfminer/pdfminer.six/pull/399))
2020-03-26 22:02:48 +00:00
- Updated misleading documentation for `word_margin` and `char_margin` ([#407](https://github.com/pdfminer/pdfminer.six/pull/407))
2020-03-16 19:12:45 +00:00
- Ignore ValueError when converting font encoding differences ([#389](https://github.com/pdfminer/pdfminer.six/pull/389))
2020-03-14 09:33:39 +00:00
- Grouping of text lines outside of parent container bounding box ([#386](https://github.com/pdfminer/pdfminer.six/pull/386))
2020-01-24 11:38:11 +00:00
2020-03-23 21:38:39 +00:00
### Changed
2020-10-25 11:34:51 +00:00
- Group text lines if they are centered ([#384](https://github.com/pdfminer/pdfminer.six/pull/384))
2020-03-23 21:38:39 +00:00
2022-06-25 21:11:10 +00:00
## [20200124]
2020-01-24 11:38:11 +00:00
2020-01-24 11:36:02 +00:00
### Security
- Removed samples/issue-00152-embedded-pdf.pdf because it contains a possible security thread; a javascript enabled object ([#364](https://github.com/pdfminer/pdfminer.six/pull/364))
2020-01-21 20:13:52 +00:00
2022-06-25 21:11:10 +00:00
## [20200121]
2020-01-21 20:13:52 +00:00
2020-01-07 20:59:13 +00:00
### Fixed
- Interpret two's complement integer as unsigned integer ([#352](https://github.com/pdfminer/pdfminer.six/pull/352))
2020-01-16 21:25:20 +00:00
- Fix font name in html output such that it is recognized by browser ([#357](https://github.com/pdfminer/pdfminer.six/pull/357))
2020-01-16 21:15:50 +00:00
- Compute correct font height by removing scaling with font bounding box height ([#348](https://github.com/pdfminer/pdfminer.six/pull/348))
2020-01-16 21:11:42 +00:00
- KeyError when extracting embedded files and a Unicode file specification is missing ([#338](https://github.com/pdfminer/pdfminer.six/pull/338))
2020-01-04 17:15:15 +00:00
2020-01-16 21:26:01 +00:00
### Removed
- The command-line utility latin2ascii.py ([#360](https://github.com/pdfminer/pdfminer.six/pull/360))
2022-06-25 21:11:10 +00:00
## [20200104]
2020-01-04 17:15:15 +00:00
2022-06-25 21:11:10 +00:00
### Removed
2020-01-04 15:47:07 +00:00
- Support for Python 2 ([#346](https://github.com/pdfminer/pdfminer.six/pull/346))
2019-12-29 20:20:20 +00:00
### Changed
- Enforce pep8 coding style by adding flake8 to CI ([#345](https://github.com/pdfminer/pdfminer.six/pull/345))
2022-06-25 21:11:10 +00:00
## [20191110]
2019-11-10 11:29:14 +00:00
2019-11-10 11:18:49 +00:00
### Fixed
- Wrong order of text box grouping introduced by PR #315 ([#335](https://github.com/pdfminer/pdfminer.six/pull/335))
2019-11-07 20:52:44 +00:00
2022-06-25 21:11:10 +00:00
## [20191107]
2019-11-07 20:52:44 +00:00
2019-11-02 09:29:39 +00:00
### Deprecated
- The argument `_py2_no_more_posargs` because Python2 is removed on January
, 2020 ([#328](https://github.com/pdfminer/pdfminer.six/pull/328) and
[#307 ](https://github.com/pdfminer/pdfminer.six/pull/307 ))
2019-10-22 15:37:06 +00:00
### Added
2019-11-07 06:54:10 +00:00
- Simple wrapper to easily extract text from a PDF file [#330 ](https://github.com/pdfminer/pdfminer.six/pull/330 )
2019-10-22 15:37:06 +00:00
- Support for extracting JBIG2 encoded images ([#311](https://github.com/pdfminer/pdfminer.six/pull/311) and [#46 ](https://github.com/pdfminer/pdfminer.six/pull/46 ))
2019-11-07 20:12:34 +00:00
- Sphinx documentation that is published on
[Read the Docs ](https://pdfminersix.readthedocs.io/ )
([#329](https://github.com/pdfminer/pdfminer.six/pull/329))
2019-10-20 12:21:48 +00:00
2019-10-25 20:49:58 +00:00
### Fixed
2019-11-06 20:51:34 +00:00
- Unhandled AssertionError when dumping pdf containing reference to object id 0
([#318](https://github.com/pdfminer/pdfminer.six/pull/318))
- Debug flag actually changes logging level to debug for pdf2txt.py and
dumppdf.py ([#325](https://github.com/pdfminer/pdfminer.six/pull/325))
2019-10-25 20:49:58 +00:00
2019-10-27 20:40:04 +00:00
### Changed
- Using argparse instead of getopt for command line interface of dumppdf.py ([#321](https://github.com/pdfminer/pdfminer.six/pull/321))
2019-10-31 08:22:58 +00:00
- Refactor `LTLayoutContainer.group_textboxes` for a significant speed up in layout analysis ([#315](https://github.com/pdfminer/pdfminer.six/pull/315))
2019-10-27 20:40:04 +00:00
2019-10-26 17:16:37 +00:00
### Removed
2020-10-25 11:34:51 +00:00
- Files for external applications such as django, cgi and pyinstaller ([#320](https://github.com/pdfminer/pdfminer.six/pull/320))
2019-10-26 17:16:37 +00:00
2022-06-25 21:11:10 +00:00
## [20191020]
2019-10-20 12:21:48 +00:00
2019-10-20 11:59:29 +00:00
### Deprecated
- Support for Python 2 is dropped at January 1st, 2020 ([#307](https://github.com/pdfminer/pdfminer.six/pull/307))
2019-10-20 10:32:11 +00:00
### Added
- Contribution guidelines in [CONTRIBUTING.md ](CONTRIBUTING.md ) ([#259](https://github.com/pdfminer/pdfminer.six/pull/259))
- Support new encodings OneByteEncoding and DLIdent for CMaps ([#283](https://github.com/pdfminer/pdfminer.six/pull/283))
### Fixed
- Use `six.iteritems()` instead of `dict().iteritems()` to ensure Python2 and Python3 compatibility ([#274](https://github.com/pdfminer/pdfminer.six/pull/274))
- Properly convert Adobe Glyph names to unicode characters ([#263](https://github.com/pdfminer/pdfminer.six/pull/263))
- Allow CMap to be a content stream ([#283](https://github.com/pdfminer/pdfminer.six/pull/283))
- Resolve indirect objects for width and bounding boxes for fonts ([#273](https://github.com/pdfminer/pdfminer.six/pull/273))
- Actually updating stroke color in graphic state ([#298](https://github.com/pdfminer/pdfminer.six/pull/298))
- Interpret (invalid) negative font descent as a positive descent ([#203](https://github.com/pdfminer/pdfminer.six/pull/203))
- Correct colorspace comparision for images ([#132](https://github.com/pdfminer/pdfminer.six/pull/132))
- Allow for bounding boxes with zero height or width by removing assertion ([#246](https://github.com/pdfminer/pdfminer.six/pull/246))
### Changed
2019-10-22 15:37:06 +00:00
- All dependencies are managed in `setup.py` ([#306](https://github.com/pdfminer/pdfminer.six/pull/306) and [#219 ](https://github.com/pdfminer/pdfminer.six/pull/219 ))
2019-10-20 10:32:11 +00:00
2022-06-25 21:11:10 +00:00
## [20181108]
2019-10-20 10:32:11 +00:00
### Changed
- Speedup layout analysis ([#141](https://github.com/pdfminer/pdfminer.six/pull/141))
- Use argparse instead of replace deprecated getopt ([#173](https://github.com/pdfminer/pdfminer.six/pull/173))
2020-04-01 19:37:39 +00:00
- Allow pdfminer.six to be compiled with cython ([#142](https://github.com/pdfminer/pdfminer.six/pull/142))