Commit Graph

10 Commits (b4054ff4cfb52ae7a2ca44e6f2d9575869222a22)

Author SHA1 Message Date
Kwok-kuen Cheung 60863cfd55
Fix converting path to multiple rectangles (#371)
* Fix converting path to multiple rectangles

For path that consists of a series of rectangles
(shape is 'mlllhmlllh...'), call paint_path again with each group of
5 points. The result is multiple rects instead of a single curve.

fixes #369

* Reduce pdf size by removing font

* Add unittest for PDFLayoutAnalyzer.paint_path()

* Add line to CHANGELOG.md

* Add reference to pdf reference manual

* Cleanup function paint_path a bit

* Reduce line length of tests

* Reduce line length of tests

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2020-07-11 17:34:38 +02:00
madhurcodes 6a9269b432
Change Text extraction is not allowed error to warning (#453)
* Changed error to warning for 'Text extraction is not allowed'

* updated changelog

* fix lint

* made changes suggested in review

* Update CHANGELOG.md

* Add regression test for failing pdf

* Reduce line length to <80

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2020-07-11 16:04:11 +02:00
Pieter Marsman 1c3047b68b
Remove samples/ directory from source distribution to prevent downloading all pdf's when installing pdfminer.six (#364)
Fixes #363 

* Remove samples/ and docs/ from source distribution. The samples/ dictionairy contains pdf's for testing purposes and the docs/ contain readthedocs documentation and is published online.

* Remove issue-00152-embedded-pdf.pdf because it contains a possible exploit.

See https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-description?Name=Exploit%3AJS%2FShellCode.gen
And https://github.com/pdfminer/pdfminer.six/issues/363

* Added line to CHANGELOG.md

* Remove unused imports
2020-01-24 12:36:02 +01:00
Pieter Marsman 2f7f5d2667
Fallback on backwards-compatible key (F) for embedded files URL's when the unicode URL (UF) does not exist (#338)
* Fix getting filename when extracting embedded files

* Add test for pdf that contains embedded pdf, and fix additional errors in looping over multiple xrefs

* Add line to CHANGELOG
2020-01-16 22:11:42 +01:00
Recursing 0b1741b9bf Pack the /P (ermissions) entry from the /Encrypt dictionionary in the file trailer, as unsigned long (#352)
Fixes #186 

* Tread the permissions (the /P entry) as unsigned long, fix #186

* handle negative values for p

* Extract function for resolving an twos-complement

* Add test for issue #352

* Add line to CHANGELOG.md

* Only ints can be converted to a uint using two's-complement method

* Standardize import style; multiple imports from same module on one line

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2020-01-07 21:59:13 +01:00
Pieter Marsman 1c4a4167ed
Fix failing test on develop & cleaning up test files (#319) 2019-10-26 18:42:33 +02:00
jbarlow83 733ddf7e57 Added: tests for extracting tests from pdfs with Type3 fonts (#205) 2019-10-22 18:15:59 +02:00
Pieter Marsman 373c6e7b97
Added: extraction of JBIG2 encoded images (#311)
And added test for pdf with JBIG2 image.

Fixes #26 
Closes #46
2019-10-22 17:37:06 +02:00
Philippe Guglielmetti 82af7f0aac issue #56 reproduced, solution attempt unsucessful 2017-04-19 14:19:14 +02:00
Philippe Guglielmetti 7055862eaf solves https://github.com/pdfminer/pdfminer.six/issues/50 2017-04-18 18:20:31 +02:00