pdfminer.six

Commit Graph

Author	SHA1	Message	Date
wind_chh	234c466372	Fix extraction of some cjk characters (#593 ) Fixes #566 * try to fix issue of some Chinese characters cannot be extracted correctly (#566). * format code to pass flake8 check. * fix typo and refer to issue 593. Co-authored-by: huan_cheng <huan_cheng@bestsign.cn> Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>	2021-08-26 21:05:03 +02:00
Pieter Marsman	4f65242750	Always try to get CMap, even if name is not recognized (#438 ) * Add trying to get cmap from pickle file. And cleaning up a bit. * Don't use keyword argument for dict.get * Add docs * Make _get_cmap_name static * Add test * Add CHANGELOG.md * Remove identity mappings from IDENTITY_ENCODER because that's now the default if the key is not in there * Add CJK characters to expected output of simple3.pdf * Fix line length * Add comment	2020-07-23 20:27:38 +02:00
Jake Stockwin	7254530d27	Fix ordering of textlines within a textbox when boxes_flow is disabled (#412 ) * Fix ordering of textlines within a textbox when boxes_flow is disabled * Add new test PDF sample	2020-05-09 15:37:49 +02:00
Jake Stockwin	68e2ae8632	Fix text coming in reverse order with boxes flow disabled (#399 ) Closes #398	2020-04-01 13:37:04 +02:00
Jake Stockwin	1a4a06da9f	Fix #392 Split out IO logic from high level functions (#393 ) * Allow file-like inputs to high level functions (#392) * PR Review - move open_filename to utils	2020-03-26 22:52:00 +01:00
Pieter Marsman	fff3ac2ba6	Fix bug in computing character bounding box (#348 ) * Remove scaling font height/width with size of font bounding box * Refactor LTChar bounding box computation * Change expected outcome of `python tools/pdf2txt.py samples/simple3.pdf`, because it looks like an improvement. However, when I view `samples/simple3.pdf` I don't see any text at all. The change in expected outcome is explained by the fact that the bounding boxes of characters can be different, depending on the `/FontBBox` parameter of the font. * Add test for font sizes, and for this a high-level function that returns an iterator of LTPage objects * Add line to CHANGELOG	2020-01-16 22:15:50 +01:00
Pieter Marsman	f3ab1bc61e	Enforce pep8 coding-style (#345 ) * Code Refractor: Use code-style enforcement #312 * Add flake8 to travis-ci * Remove python 2 3 comment on six library. 891 errors > 870 errors. * Remove class and functions comments that consist of just the name. 870 errors > 855 errors. * Fix flake8 errors in pdftypes.py. 855 errors > 833 errors. * Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting * Cleanup pdfinterp.py and add documentation from PDF Reference * Cleanup pdfpage.py * Cleanup pdffont.py * Clean psparser.py * Cleanup high_level.py * Cleanup layout.py * Cleanup pdfparser.py * Cleanup pdfcolor.py * Cleanup rijndael.py * Cleanup converter.py * Rename klass to cls if it is the class variable, to be more consistent with standard practice * Cleanup cmap.py * Cleanup pdfdevice.py * flake8 ignore fontmetrics.py * Cleanup test_pdfminer_psparser.py * Fix flake8 in pdfdocument.py; 339 errors to go * Fix flake8 utils.py; 326 errors togo * pep8 correction for few files in /tools/ 328 > 160 to go (#342) * pep8 correction for few files in /tools/ 328 > 160 to go * pep8 correction: 160 > 5 to go * Fix ascii85.py errors * Fix error in getting index from target that does not exists * Remove commented print lines * Fix flake8 error in pdfinterp.py * Fix python2 specific error by removing argument from print statement * Ignore invalid python2 syntax * Update contributing.md * Added changelog * Remove unused import Co-authored-by: Fakabbir Amin <f4amin@gmail.com>	2019-12-29 21:20:20 +01:00
Pieter Marsman	2bee7d8dcf	Fix wrong ordering of grouping textboxes introduced by #315 . The first grouping of textboxes should be skipped if there are intermediate textboxes. (#335 ) Fixes #334	2019-11-10 12:18:49 +01:00
Igor Moura	40aa2533c9	Added: simple wrapper to extract text from pdf (#330 ) Fixes #327	2019-11-07 07:54:10 +01:00

9 Commits (da5b96828efdb184f6410c43fea30f7b7c893dfb)