Commit Graph

4 Commits (bab6d154c2ab698bc79a8fa7638ec8e6b3ce809c)

Author SHA1 Message Date
Pieter Marsman fff3ac2ba6
Fix bug in computing character bounding box (#348)
* Remove scaling font height/width with size of font bounding box

* Refactor LTChar bounding box computation

* Change expected outcome of `python tools/pdf2txt.py samples/simple3.pdf`, because it looks like an improvement. However, when I view `samples/simple3.pdf` I don't see any text at all. The change in expected outcome is explained by the fact that the bounding boxes of characters can be different, depending on the `/FontBBox` parameter of the font.

* Add test for font sizes, and for this a high-level function that returns an iterator of LTPage objects

* Add line to CHANGELOG
2020-01-16 22:15:50 +01:00
Pieter Marsman f3ab1bc61e
Enforce pep8 coding-style (#345)
* Code Refractor: Use code-style enforcement #312

* Add flake8 to travis-ci

* Remove python 2 3 comment on six library. 891 errors > 870 errors.

* Remove class and functions comments that consist of just the name. 870 errors > 855 errors.

* Fix flake8 errors in pdftypes.py. 855 errors > 833 errors.

* Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting

* Cleanup pdfinterp.py and add documentation from PDF Reference

* Cleanup pdfpage.py

* Cleanup pdffont.py

* Clean psparser.py

* Cleanup high_level.py

* Cleanup layout.py

* Cleanup pdfparser.py

* Cleanup pdfcolor.py

* Cleanup rijndael.py

* Cleanup converter.py

* Rename klass to cls if it is the class variable, to be more consistent with standard practice

* Cleanup cmap.py

* Cleanup pdfdevice.py

* flake8 ignore fontmetrics.py

* Cleanup test_pdfminer_psparser.py

* Fix flake8 in pdfdocument.py; 339 errors to go

* Fix flake8 utils.py; 326 errors togo

* pep8 correction for few files in /tools/ 328 > 160 to go (#342)

* pep8 correction for few files in /tools/ 328 > 160 to go

* pep8 correction: 160 > 5 to go

* Fix ascii85.py errors

* Fix error in getting index from target that does not exists

* Remove commented print lines

* Fix flake8 error in pdfinterp.py

* Fix python2 specific error by removing argument from print statement

* Ignore invalid python2 syntax

* Update contributing.md

* Added changelog

* Remove unused import

Co-authored-by: Fakabbir Amin <f4amin@gmail.com>
2019-12-29 21:20:20 +01:00
Pieter Marsman 2bee7d8dcf
Fix wrong ordering of grouping textboxes introduced by #315. The first grouping of textboxes should be skipped if there are intermediate textboxes. (#335)
Fixes #334
2019-11-10 12:18:49 +01:00
Igor Moura 40aa2533c9 Added: simple wrapper to extract text from pdf (#330)
Fixes #327
2019-11-07 07:54:10 +01:00