Commit Graph

6 Commits (b84cfc98e0e48f8dbc5099a6aaa71a6fc7e52895)

Author SHA1 Message Date
Pieter Marsman b84cfc98e0
Update development tools: travis ci to github actions, tox to nox, nose to pytest (#704)
* Replace tox with nox

* Replace travis with github actions

* Fix pytest, mypy and flake8 errors

* Add pytest.

* Run on all commits

* Remove nose

* Speedup slow tests to save GitHub actions minutes

* Added line to CHANGELOG.md

* Fix line too long in pdfdocument.py

* Update .github/workflows/actions.yml

Co-authored-by: Jake Stockwin <jstockwin@gmail.com>

* Improve actions.yml

* Fix error with nox name for mypy

* Add names for jobs

* Replace nose.raises with pytest.raises

Co-authored-by: Jake Stockwin <jstockwin@gmail.com>
2022-02-02 22:24:32 +01:00
htInEdin 33d7dde4d1
Fix bug: _is_binary_stream should recognize TextIOWrapper as non-binary, escaped \r\n should be removed (#616)
* detect TextIOWrapper as non-binary

* I don't understand the CHANGELOG.md format, hope this is good enough

* Delete \\\r\n in Literal Strings (ref. section 7.3.4.2 of PDF32000_2008)

* Keep Travis CI happy

* Added test

* Remove pdfminer/Changelog

* Prettify _parse_string_1

* Add CHANGELOG.md

* Satisfy flake8

* Update CHANGELOG.md

* Use logging.Logger.warning instead of warning.warn in most cases, following
 the Python official guidance that warning.warn is directed at _developers_,
 not users

 * (pdfdocument.py) remove declarations of PDFTextExtractionNotAllowedWarning,
			PDFNoValidXRefWarning

 * (pdfpage.py) Don't import warning, don't use PDFTextExtractionNotAllowedWarning

 * (tools/dumppdf.py) Don't import warning, don't use PDFNoValidXRefWarning

 * (tests/test_tools_dumppdf.py) Don't import warning, check for logging.WARN rather
				  than PDFNoValidXRefWarning

* get name right

* make flake8 happy

* Revert "make flake8 happy"

This reverts commit 4592769686.

* Revert "get name right"

This reverts commit 80091ea211.

* Revert "Use logging.Logger.warning instead of warning.warn in most cases, following"

This reverts commit 3c1e3d6606.

* Revert "Merge branch 'preferLoggingToWarning' into hst"

This reverts commit 9d9d139921, reversing
changes made to 80091ea211.

* Revert "Revert "Merge branch 'preferLoggingToWarning' into hst""

This reverts commit b3da21934d.

Co-authored-by: Henry S. Thompson <ht@home.hst.name>
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2021-09-27 20:30:40 +02:00
Jeremy Singer-Vine 016239c146
Fix .paint_path handling of single line segments (#530)
* Fix .paint_path handling of single line segments

- Fixes typo ("ml" should have been "mlh")

- Removes if-statement that required individual line segments to be
  strictly horizontal or vertical.

* Treat 'ml'-shape paths as lines not curves

Althoguh 'mlh' is the canonical implementation for a single line
segment, 'ml' is fairly common.

Adds tests and sample PDF.

* Fix trailing whitespace

* Fix point-extraction from Beziér path commands

This commit corrects the manner in which "pts" are extracted from Beziér
path commands. See Table 4.9 of PDF reference manual, and new comments
in code for details. Previously, depending on whether the command (c,
v, or y) the code was extracting some combination of control points (not
on curve) and the actual points-on-curve.

This commit also refactors .paint_path, so that apply_matrix_pt is only
called in one place, and to treat the "h" command in a manner more
consistent with other path commands.

* Add comments to test_paint_path_quadrilaterals

* Parse rect-forming mllll paths as rects not curves

Now that .paint_path has been refactored, adding support for
rect-forming mllll paths requires no extra code, beyond a minor tweak to
the relevant elif statement.

* One changelog line with ref to mr

* Remove PDFLayoutAnalyzer._create_curve because implementation has become trivial due to refactoring

* Extract variables from if statement to make it easier to read

* Optimize imports order

* Trigger travis build

* Revert "Trigger travis build"

This reverts commit 41c05184

* Update travis badge

* Update travis badge

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2021-07-27 18:27:32 +02:00
Pieter Marsman f8e6ad6ac1
Remove supoprt for non standard output streams that are not binary by removing the try-except check that writes a unicode character to the stream (#523)
Closes #191 

* Remove supoprt for non standard output streams that are not binary by removing the try-except check that writes a unicode character to the stream

* Add docstring

* Fix flake8
2020-10-25 14:37:12 +01:00
Jeremy Singer-Vine e83dd26671
Fix .paint_path for non-rectangle quadrilaterals (#512)
* Fix paint_path bug noted in issue #473

Focuses on the handling of non-rect quadrilaterals, the decomposition of
complex (m.*h)* paths into subpaths, and assigning those subpaths the
correct LTCurve/LTRect type.

Also adds a test for cases presented in issue #473

* Tweak paint_path fix per @pietermarsman review

- Adjusts logic to adhere to if-elif-else rather than early returns.

- Shortens subpath detection/reprocessing step, using re.finditer().

* Reorder paint_path() if-else statements once more

* Fix flake8 issues

* Fix error: should select item 1 and 2 from the list, and possible items [3, 4], and so on.

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2020-10-12 17:53:00 +02:00
Kwok-kuen Cheung 60863cfd55
Fix converting path to multiple rectangles (#371)
* Fix converting path to multiple rectangles

For path that consists of a series of rectangles
(shape is 'mlllhmlllh...'), call paint_path again with each group of
5 points. The result is multiple rects instead of a single curve.

fixes #369

* Reduce pdf size by removing font

* Add unittest for PDFLayoutAnalyzer.paint_path()

* Add line to CHANGELOG.md

* Add reference to pdf reference manual

* Cleanup function paint_path a bit

* Reduce line length of tests

* Reduce line length of tests

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2020-07-11 17:34:38 +02:00