Section 4.5 of the PDF reference says: "Color values are interpreted
according to the current color space, another parameter of the graphics
state. A PDF content stream first selects a color space by invoking the
CS operator (for the stroking color) or the cs operator (for the
non-stroking color). It then selects color values within that color
space with the SC operator (stroking) or the sc operator (nonstroking).
There are also convenience operators—G, g, RG, rg, K, and k—that select
both a color space and a color value within it in a single step."
Previously, those convenience operators did *not* set the color space.
This commit, following on filed issue #779, fixes this. It also adds a
test to demonstrate that, at least for the do_rg method, the fix works
as intended.
* Ignore path constructors that do not begin with m
Per PDF Reference Section 4.4.1, "path construction operators may be
invoked in any sequence, but the first one invoked must be m or re to
begin a new subpath." Since pdfminer.six already converts all `re`
(rectangle) operators to their equivelent `mlllh` representation, paths
ingested by `.paint_path(...)` that do not begin with the `m` operator
are invalid.
In addition to the advantage of hewing to the PDF Reference, this change
also avoids the `ValueError: not enough values to unpack (expected 2,
got 1)` error raised by the ` pts = [apply_matrix_pt(self.ctm, pt) for
pt in raw_pts]` line in `converter.py` when parsing PDFs that
(erroneously) include `("h",)` paths.
* Update CHANGELOG.md
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
* Replace tox with nox
* Replace travis with github actions
* Fix pytest, mypy and flake8 errors
* Add pytest.
* Run on all commits
* Remove nose
* Speedup slow tests to save GitHub actions minutes
* Added line to CHANGELOG.md
* Fix line too long in pdfdocument.py
* Update .github/workflows/actions.yml
Co-authored-by: Jake Stockwin <jstockwin@gmail.com>
* Improve actions.yml
* Fix error with nox name for mypy
* Add names for jobs
* Replace nose.raises with pytest.raises
Co-authored-by: Jake Stockwin <jstockwin@gmail.com>
* detect TextIOWrapper as non-binary
* I don't understand the CHANGELOG.md format, hope this is good enough
* Delete \\\r\n in Literal Strings (ref. section 7.3.4.2 of PDF32000_2008)
* Keep Travis CI happy
* Added test
* Remove pdfminer/Changelog
* Prettify _parse_string_1
* Add CHANGELOG.md
* Satisfy flake8
* Update CHANGELOG.md
* Use logging.Logger.warning instead of warning.warn in most cases, following
the Python official guidance that warning.warn is directed at _developers_,
not users
* (pdfdocument.py) remove declarations of PDFTextExtractionNotAllowedWarning,
PDFNoValidXRefWarning
* (pdfpage.py) Don't import warning, don't use PDFTextExtractionNotAllowedWarning
* (tools/dumppdf.py) Don't import warning, don't use PDFNoValidXRefWarning
* (tests/test_tools_dumppdf.py) Don't import warning, check for logging.WARN rather
than PDFNoValidXRefWarning
* get name right
* make flake8 happy
* Revert "make flake8 happy"
This reverts commit 4592769686.
* Revert "get name right"
This reverts commit 80091ea211.
* Revert "Use logging.Logger.warning instead of warning.warn in most cases, following"
This reverts commit 3c1e3d6606.
* Revert "Merge branch 'preferLoggingToWarning' into hst"
This reverts commit 9d9d139921, reversing
changes made to 80091ea211.
* Revert "Revert "Merge branch 'preferLoggingToWarning' into hst""
This reverts commit b3da21934d.
Co-authored-by: Henry S. Thompson <ht@home.hst.name>
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
* Fix .paint_path handling of single line segments
- Fixes typo ("ml" should have been "mlh")
- Removes if-statement that required individual line segments to be
strictly horizontal or vertical.
* Treat 'ml'-shape paths as lines not curves
Althoguh 'mlh' is the canonical implementation for a single line
segment, 'ml' is fairly common.
Adds tests and sample PDF.
* Fix trailing whitespace
* Fix point-extraction from Beziér path commands
This commit corrects the manner in which "pts" are extracted from Beziér
path commands. See Table 4.9 of PDF reference manual, and new comments
in code for details. Previously, depending on whether the command (c,
v, or y) the code was extracting some combination of control points (not
on curve) and the actual points-on-curve.
This commit also refactors .paint_path, so that apply_matrix_pt is only
called in one place, and to treat the "h" command in a manner more
consistent with other path commands.
* Add comments to test_paint_path_quadrilaterals
* Parse rect-forming mllll paths as rects not curves
Now that .paint_path has been refactored, adding support for
rect-forming mllll paths requires no extra code, beyond a minor tweak to
the relevant elif statement.
* One changelog line with ref to mr
* Remove PDFLayoutAnalyzer._create_curve because implementation has become trivial due to refactoring
* Extract variables from if statement to make it easier to read
* Optimize imports order
* Trigger travis build
* Revert "Trigger travis build"
This reverts commit 41c05184
* Update travis badge
* Update travis badge
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
Closes#191
* Remove supoprt for non standard output streams that are not binary by removing the try-except check that writes a unicode character to the stream
* Add docstring
* Fix flake8
* Fix paint_path bug noted in issue #473
Focuses on the handling of non-rect quadrilaterals, the decomposition of
complex (m.*h)* paths into subpaths, and assigning those subpaths the
correct LTCurve/LTRect type.
Also adds a test for cases presented in issue #473
* Tweak paint_path fix per @pietermarsman review
- Adjusts logic to adhere to if-elif-else rather than early returns.
- Shortens subpath detection/reprocessing step, using re.finditer().
* Reorder paint_path() if-else statements once more
* Fix flake8 issues
* Fix error: should select item 1 and 2 from the list, and possible items [3, 4], and so on.
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
* Fix converting path to multiple rectangles
For path that consists of a series of rectangles
(shape is 'mlllhmlllh...'), call paint_path again with each group of
5 points. The result is multiple rects instead of a single curve.
fixes#369
* Reduce pdf size by removing font
* Add unittest for PDFLayoutAnalyzer.paint_path()
* Add line to CHANGELOG.md
* Add reference to pdf reference manual
* Cleanup function paint_path a bit
* Reduce line length of tests
* Reduce line length of tests
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>