Commit Graph

64 Commits (ac2b20a79a62ea1791cf25295611d9a3100679f2)

Author SHA1 Message Date
Jake Stockwin 7254530d27
Fix ordering of textlines within a textbox when boxes_flow is disabled (#412)
* Fix ordering of textlines within a textbox when boxes_flow is disabled

* Add new test PDF sample
2020-05-09 15:37:49 +02:00
Pieter Marsman 1c3047b68b
Remove samples/ directory from source distribution to prevent downloading all pdf's when installing pdfminer.six (#364)
Fixes #363 

* Remove samples/ and docs/ from source distribution. The samples/ dictionairy contains pdf's for testing purposes and the docs/ contain readthedocs documentation and is published online.

* Remove issue-00152-embedded-pdf.pdf because it contains a possible exploit.

See https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-description?Name=Exploit%3AJS%2FShellCode.gen
And https://github.com/pdfminer/pdfminer.six/issues/363

* Added line to CHANGELOG.md

* Remove unused imports
2020-01-24 12:36:02 +01:00
Pieter Marsman fff3ac2ba6
Fix bug in computing character bounding box (#348)
* Remove scaling font height/width with size of font bounding box

* Refactor LTChar bounding box computation

* Change expected outcome of `python tools/pdf2txt.py samples/simple3.pdf`, because it looks like an improvement. However, when I view `samples/simple3.pdf` I don't see any text at all. The change in expected outcome is explained by the fact that the bounding boxes of characters can be different, depending on the `/FontBBox` parameter of the font.

* Add test for font sizes, and for this a high-level function that returns an iterator of LTPage objects

* Add line to CHANGELOG
2020-01-16 22:15:50 +01:00
Pieter Marsman 2f7f5d2667
Fallback on backwards-compatible key (F) for embedded files URL's when the unicode URL (UF) does not exist (#338)
* Fix getting filename when extracting embedded files

* Add test for pdf that contains embedded pdf, and fix additional errors in looping over multiple xrefs

* Add line to CHANGELOG
2020-01-16 22:11:42 +01:00
Recursing 0b1741b9bf Pack the /P (ermissions) entry from the /Encrypt dictionionary in the file trailer, as unsigned long (#352)
Fixes #186 

* Tread the permissions (the /P entry) as unsigned long, fix #186

* handle negative values for p

* Extract function for resolving an twos-complement

* Add test for issue #352

* Add line to CHANGELOG.md

* Only ints can be converted to a uint using two's-complement method

* Standardize import style; multiple imports from same module on one line

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2020-01-07 21:59:13 +01:00
Pieter Marsman 3502dc9f3b
Drop support for legacy Python 2 (#346)
* Drop support for legacy Python 2

* Add python_requires to help pip

* Upgrade Python syntax with pyupgrade

* Upgrade Python syntax with pyupgrade --py3-plus

* Python 3 imports

* Replace six

* Update CONTRIBUTING.md

* Added line to changelog

Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
2020-01-04 16:47:07 +01:00
Pieter Marsman 1c4a4167ed
Fix failing test on develop & cleaning up test files (#319) 2019-10-26 18:42:33 +02:00
jbarlow83 733ddf7e57 Added: tests for extracting tests from pdfs with Type3 fonts (#205) 2019-10-22 18:15:59 +02:00
Pieter Marsman 373c6e7b97
Added: extraction of JBIG2 encoded images (#311)
And added test for pdf with JBIG2 image.

Fixes #26 
Closes #46
2019-10-22 17:37:06 +02:00
Fakabbir Amin 5b210981c9 Adds Test Case 2019-08-10 10:19:20 +05:30
Sebastian Schuberth ec8530f6cf Add a test for the previous fix 2017-10-16 12:35:16 +02:00
Philippe Guglielmetti b010db6049 solves https://github.com/pdfminer/pdfminer.six/issues/65 2017-07-20 21:17:06 +02:00
Philippe Guglielmetti 82af7f0aac issue #56 reproduced, solution attempt unsucessful 2017-04-19 14:19:14 +02:00
Philippe Guglielmetti 7055862eaf solves https://github.com/pdfminer/pdfminer.six/issues/50 2017-04-18 18:20:31 +02:00
Daniel Berthereau 10815bff7b Fixed tests. 2016-06-27 00:00:00 +02:00
cybjit 2ee7153f6e add python3 in sample Makefile 2014-09-16 22:56:13 +02:00
Yusuke Shinyama 2e900e5d10 Fixed for consistent test results. (hopefully...) 2014-06-26 17:41:31 +09:00
Yusuke Shinyama a3ab6c253b Fixed: loose autotesting. 2014-06-25 19:50:20 +09:00
Yusuke Shinyama 8f9c4dedff Test rig cleanup. 2014-06-15 11:41:30 +09:00
Yusuke Shinyama a8ec99a848 More autotest tweaks. 2014-06-15 10:52:59 +09:00
Yusuke Shinyama fb3f2d9629 Further test tweaks. 2014-06-14 12:00:31 +09:00
Yusuke Shinyama a7489aaabe Fixed: autotests 2014-06-14 10:54:40 +09:00
numion a4997d6f10 Implement revision 4 and 5 encryption handler. 2014-05-19 16:27:43 +02:00
Yusuke Shinyama c8b6d4112a Fixed: crash with negative layout bbox. 2013-11-09 15:10:14 +09:00
Matthew Duggan f02cb11945 Update test references based on recent layout analysis improvements 2013-11-07 17:44:09 +09:00
Yusuke Shinyama 56917a213c testcase updated 2011-05-15 01:22:51 +09:00
Yusuke Shinyama e8cd880409 testdata changed 2011-02-27 19:48:22 +09:00
yusuke.shinyama.dummy 5d98a27d9c test cases updated
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@282 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:41:11 +00:00
yusuke.shinyama.dummy 509ab66319 stay with python2
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy 607d4734db update test cases
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@255 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:28 +00:00
yusuke.shinyama.dummy 3305c07ba2 layout analysis improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@245 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:39 +00:00
yusuke.shinyama.dummy 0944cfaded test file simple3.pdf added.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@240 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:41 +00:00
yusuke.shinyama.dummy 83d2086f19 fix minor layout issue
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@239 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:31 +00:00
yusuke.shinyama.dummy f5aff374fc some wordings and documentations
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@229 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 03:56:50 +00:00
yusuke.shinyama.dummy f2005bee55 non-free sample files moved into a separate directory
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@227 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-13 04:35:18 +00:00
yusuke.shinyama.dummy aa7e7d3e35 add a README file to show credits of the sample files.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@223 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-06 05:16:37 +00:00
yusuke.shinyama.dummy 836eb37b47 test reference results changed
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@204 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:29:40 +00:00
yusuke.shinyama.dummy 5f822f6dcb improved layout analysis.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@197 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-26 11:11:35 +00:00
yusuke.shinyama.dummy 2e5b92c18a writing mode detection
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@196 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-25 11:38:47 +00:00
yusuke.shinyama.dummy 40b36a7c42 consistent test results
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@191 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 06:04:54 +00:00
yusuke.shinyama.dummy fa13122f09 add regression tests.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@189 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 04:34:52 +00:00
yusuke.shinyama.dummy cd39642abe code cleanup
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@188 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 04:00:18 +00:00
yusuke.shinyama.dummy 2dee2efad9 apply more patches
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@181 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-13 15:00:43 +00:00
yusuke.shinyama.dummy 0f8fe3f19e Page rotation bug fixed.
Various minor fixes.


git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@176 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-31 02:09:28 +00:00
yusuke.shinyama.dummy 77986b8273 fix CMapDB initialization stuff. more code cleanup.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@148 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-03 13:39:34 +00:00
yusuke.shinyama.dummy 78f7866554 sgml to xml
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@146 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-31 03:04:56 +00:00
yusuke.shinyama.dummy e8b1309e76 testcase added
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@140 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-24 02:50:07 +00:00
yusuke.shinyama.dummy 57025ee632 git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@122 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-07-21 16:06:50 +00:00
yusuke.shinyama.dummy 8a5bec5065 layout analysis improved.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@120 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-07-21 07:55:19 +00:00
yusuke.shinyama.dummy fee33266aa test added
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@111 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-05-17 14:05:41 +00:00