Commit Graph

26 Commits (bc494ff03cc2d1139a7806a8ca74a477a67f14e9)

Author SHA1 Message Date
Pieter Marsman bc494ff03c Bump version to 20200121 2020-01-21 21:13:52 +01:00
Pieter Marsman 52da65d5eb
Remove latin2ascii.py because it converts the latin-interpreted bytes of a file to ascii, but this has not much to do with PDF's. (#360)
* Remove latin2ascii.py because it converts the latin-interpreted bytes of a file to ascii, but this has not much to do with PDF's.

* Added line to CHANGELOG.md
2020-01-16 22:26:01 +01:00
Pieter Marsman 410d7ecac3
Fix value for font-family in html by removing the subset tag from the PDF font-name (#357)
* Fix font name by removing subset tag

* Added line to CHANGELOG.md

* Add documentation and clear variable name

* Use `html.escape()` to encode strings for html and always return `str` instead of `bytes`
2020-01-16 22:25:20 +01:00
Pieter Marsman fff3ac2ba6
Fix bug in computing character bounding box (#348)
* Remove scaling font height/width with size of font bounding box

* Refactor LTChar bounding box computation

* Change expected outcome of `python tools/pdf2txt.py samples/simple3.pdf`, because it looks like an improvement. However, when I view `samples/simple3.pdf` I don't see any text at all. The change in expected outcome is explained by the fact that the bounding boxes of characters can be different, depending on the `/FontBBox` parameter of the font.

* Add test for font sizes, and for this a high-level function that returns an iterator of LTPage objects

* Add line to CHANGELOG
2020-01-16 22:15:50 +01:00
Pieter Marsman 2f7f5d2667
Fallback on backwards-compatible key (F) for embedded files URL's when the unicode URL (UF) does not exist (#338)
* Fix getting filename when extracting embedded files

* Add test for pdf that contains embedded pdf, and fix additional errors in looping over multiple xrefs

* Add line to CHANGELOG
2020-01-16 22:11:42 +01:00
Recursing 0b1741b9bf Pack the /P (ermissions) entry from the /Encrypt dictionionary in the file trailer, as unsigned long (#352)
Fixes #186 

* Tread the permissions (the /P entry) as unsigned long, fix #186

* handle negative values for p

* Extract function for resolving an twos-complement

* Add test for issue #352

* Add line to CHANGELOG.md

* Only ints can be converted to a uint using two's-complement method

* Standardize import style; multiple imports from same module on one line

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2020-01-07 21:59:13 +01:00
Pieter Marsman b27d3d0aff Bump version 2020-01-04 18:15:15 +01:00
Pieter Marsman 3502dc9f3b
Drop support for legacy Python 2 (#346)
* Drop support for legacy Python 2

* Add python_requires to help pip

* Upgrade Python syntax with pyupgrade

* Upgrade Python syntax with pyupgrade --py3-plus

* Python 3 imports

* Replace six

* Update CONTRIBUTING.md

* Added line to changelog

Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
2020-01-04 16:47:07 +01:00
Pieter Marsman f3ab1bc61e
Enforce pep8 coding-style (#345)
* Code Refractor: Use code-style enforcement #312

* Add flake8 to travis-ci

* Remove python 2 3 comment on six library. 891 errors > 870 errors.

* Remove class and functions comments that consist of just the name. 870 errors > 855 errors.

* Fix flake8 errors in pdftypes.py. 855 errors > 833 errors.

* Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting

* Cleanup pdfinterp.py and add documentation from PDF Reference

* Cleanup pdfpage.py

* Cleanup pdffont.py

* Clean psparser.py

* Cleanup high_level.py

* Cleanup layout.py

* Cleanup pdfparser.py

* Cleanup pdfcolor.py

* Cleanup rijndael.py

* Cleanup converter.py

* Rename klass to cls if it is the class variable, to be more consistent with standard practice

* Cleanup cmap.py

* Cleanup pdfdevice.py

* flake8 ignore fontmetrics.py

* Cleanup test_pdfminer_psparser.py

* Fix flake8 in pdfdocument.py; 339 errors to go

* Fix flake8 utils.py; 326 errors togo

* pep8 correction for few files in /tools/ 328 > 160 to go (#342)

* pep8 correction for few files in /tools/ 328 > 160 to go

* pep8 correction: 160 > 5 to go

* Fix ascii85.py errors

* Fix error in getting index from target that does not exists

* Remove commented print lines

* Fix flake8 error in pdfinterp.py

* Fix python2 specific error by removing argument from print statement

* Ignore invalid python2 syntax

* Update contributing.md

* Added changelog

* Remove unused import

Co-authored-by: Fakabbir Amin <f4amin@gmail.com>
2019-12-29 21:20:20 +01:00
Pieter Marsman 803a7d9598 Release 20191110 2019-11-10 12:29:14 +01:00
Pieter Marsman 2bee7d8dcf
Fix wrong ordering of grouping textboxes introduced by #315. The first grouping of textboxes should be skipped if there are intermediate textboxes. (#335)
Fixes #334
2019-11-10 12:18:49 +01:00
Pieter Marsman 5c6fa8f986 Release 20191107 2019-11-07 21:52:44 +01:00
Pieter Marsman bc034c8e59
Create sphinx documentation for Read the Docs (#329)
Fixes #171
Fixes #199
Fixes #118
Fixes #178
Added: tests for building documentation and example code in documentation
Added: docstrings for common used functions and classes
Removed: old documentation
2019-11-07 21:12:34 +01:00
Igor Moura 40aa2533c9 Added: simple wrapper to extract text from pdf (#330)
Fixes #327
2019-11-07 07:54:10 +01:00
Pieter Marsman 027bb62943 Merge branch 'develop' of github.com:pdfminer/pdfminer.six into develop 2019-11-06 21:51:41 +01:00
Pieter Marsman 548b933a84 Add line to CHANGELOG.md for #325 2019-11-06 21:51:34 +01:00
Pieter Marsman 33b16b3f07
Deprecate the use of _py2_no_more_posargs (#328)
Fixes #324
2019-11-02 10:29:39 +01:00
Jianfeng 44b223cf0a Speedup grouping of textboxes (#315)
Changed: using a heap instead of a SortedList and avoid rebuilding the heap in each iteration
Changed: avoid potentially huge number of variable assignments in list comprehension.
Changed: avoid repeatly evaluating `obj is obj` in list comprehension by storing id(obj).
2019-10-31 09:22:58 +01:00
Pieter Marsman 6cc78ee124
Replace opts by argparse in dumppdf.py (#321)
Also add multi-character argument names
Fixes #175
2019-10-27 21:40:04 +01:00
Pieter Marsman d88d6020a2
Remove webapp and other (un)helpful application references: django, cgi, and pyinstaller. (#320)
Fixes #314 
Fixes #105
2019-10-26 19:16:37 +02:00
Pieter Marsman a238a19999
Fix assertionerror when dumping pdf with reference to objid 0 (#318)
Fixes #94 
Added: test to get check if `PDFObjectNotFound` error is raised if objid 0 is requested.
2019-10-25 22:49:58 +02:00
Pieter Marsman 373c6e7b97
Added: extraction of JBIG2 encoded images (#311)
And added test for pdf with JBIG2 image.

Fixes #26 
Closes #46
2019-10-22 17:37:06 +02:00
Pieter Marsman 694aa508c3 Release 20191020 2019-10-20 14:21:48 +02:00
Pieter Marsman adc4726e06
Add warning about dropping python2 support (#307)
Fix #303
2019-10-20 13:59:29 +02:00
Pieter Marsman dd7dc7b684 Use keep-a-changlog format for CHANGELOG.md and add unreleased additions, fixes and changes. 2019-10-20 12:32:11 +02:00
ganeshtata 90d61f2a3a DOCS: changelog for version 20181108 added 2018-11-08 22:08:17 +05:30