* Fix font name by removing subset tag
* Added line to CHANGELOG.md
* Add documentation and clear variable name
* Use `html.escape()` to encode strings for html and always return `str` instead of `bytes`
* Remove scaling font height/width with size of font bounding box
* Refactor LTChar bounding box computation
* Change expected outcome of `python tools/pdf2txt.py samples/simple3.pdf`, because it looks like an improvement. However, when I view `samples/simple3.pdf` I don't see any text at all. The change in expected outcome is explained by the fact that the bounding boxes of characters can be different, depending on the `/FontBBox` parameter of the font.
* Add test for font sizes, and for this a high-level function that returns an iterator of LTPage objects
* Add line to CHANGELOG
Fixes#186
* Tread the permissions (the /P entry) as unsigned long, fix#186
* handle negative values for p
* Extract function for resolving an twos-complement
* Add test for issue #352
* Add line to CHANGELOG.md
* Only ints can be converted to a uint using two's-complement method
* Standardize import style; multiple imports from same module on one line
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
* Drop support for legacy Python 2
* Add python_requires to help pip
* Upgrade Python syntax with pyupgrade
* Upgrade Python syntax with pyupgrade --py3-plus
* Python 3 imports
* Replace six
* Update CONTRIBUTING.md
* Added line to changelog
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
* Code Refractor: Use code-style enforcement #312
* Add flake8 to travis-ci
* Remove python 2 3 comment on six library. 891 errors > 870 errors.
* Remove class and functions comments that consist of just the name. 870 errors > 855 errors.
* Fix flake8 errors in pdftypes.py. 855 errors > 833 errors.
* Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting
* Cleanup pdfinterp.py and add documentation from PDF Reference
* Cleanup pdfpage.py
* Cleanup pdffont.py
* Clean psparser.py
* Cleanup high_level.py
* Cleanup layout.py
* Cleanup pdfparser.py
* Cleanup pdfcolor.py
* Cleanup rijndael.py
* Cleanup converter.py
* Rename klass to cls if it is the class variable, to be more consistent with standard practice
* Cleanup cmap.py
* Cleanup pdfdevice.py
* flake8 ignore fontmetrics.py
* Cleanup test_pdfminer_psparser.py
* Fix flake8 in pdfdocument.py; 339 errors to go
* Fix flake8 utils.py; 326 errors togo
* pep8 correction for few files in /tools/ 328 > 160 to go (#342)
* pep8 correction for few files in /tools/ 328 > 160 to go
* pep8 correction: 160 > 5 to go
* Fix ascii85.py errors
* Fix error in getting index from target that does not exists
* Remove commented print lines
* Fix flake8 error in pdfinterp.py
* Fix python2 specific error by removing argument from print statement
* Ignore invalid python2 syntax
* Update contributing.md
* Added changelog
* Remove unused import
Co-authored-by: Fakabbir Amin <f4amin@gmail.com>
Fixes#171Fixes#199Fixes#118Fixes#178
Added: tests for building documentation and example code in documentation
Added: docstrings for common used functions and classes
Removed: old documentation
Changed: using a heap instead of a SortedList and avoid rebuilding the heap in each iteration
Changed: avoid potentially huge number of variable assignments in list comprehension.
Changed: avoid repeatly evaluating `obj is obj` in list comprehension by storing id(obj).
This is a squashed commit, the previous messages can be seen bellow
This is the 1st commit message:
Replaced .iteritems() usage for .items()
Fixed some python 2 leftovers, as discussed in #267. Also formatted code according to Black.\nThis possibly breaks some python 2 compatibility
This is the commit message #2:
Reverted formatting and more spread six usage
The PDF RM specifies that Descent should be negative. Fonts that claim
to have a positive Descent (not that it would make sense) always seem
to be wrong about this claim.
In python2, isinstance("", bytes) is true, causing enc() to
suppress any string input. This results in fontnames being lost
when running pdf2txt.py in python2.
As this check was not present in the original python2 version of
pdfminer, restrict it to only check when running in python3.
Instead of list comprehension which will call a function to get the integer value of the bytes directly convert it to bytearray which is more optimal structure for storing list of bytes.
In the PDFStream it's possible that the /Type element is not
present, but /type is. According to the spec, these are different
elements, but in the case in point they had the same meaning.
If PDFMiner is not running in STRICT mode and /Type doesn't resolve,
a fallback to /type is used to determine the tree type.
Fix errors with:
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 850, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 860, in render_contents
self.init_resources(resources)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 360, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 210, in get_font
font = self.get_font(None, subspec)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 201, in get_font
font = PDFCIDFont(self, spec)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdffont.py", line 667, in __init__
BytesIO(self.fontfile.get_data()))
File "/app/python/lib/python3.5/site-packages/pdfminer/pdftypes.py", line 297, in get_data
self.decode()
File "/app/python/lib/python3.5/site-packages/pdfminer/pdftypes.py", line 278, in decode
if 'Predictor' in params:
TypeError: argument of type 'NoneType' is not iterable