Commit Graph

434 Commits (803a7d95988dfb7be46c3f3bc29416a3ad4aa959)

Author SHA1 Message Date
Pieter Marsman 803a7d9598 Release 20191110 2019-11-10 12:29:14 +01:00
Pieter Marsman 2bee7d8dcf
Fix wrong ordering of grouping textboxes introduced by #315. The first grouping of textboxes should be skipped if there are intermediate textboxes. (#335)
Fixes #334
2019-11-10 12:18:49 +01:00
Pieter Marsman 5c6fa8f986 Release 20191107 2019-11-07 21:52:44 +01:00
Pieter Marsman bc034c8e59
Create sphinx documentation for Read the Docs (#329)
Fixes #171
Fixes #199
Fixes #118
Fixes #178
Added: tests for building documentation and example code in documentation
Added: docstrings for common used functions and classes
Removed: old documentation
2019-11-07 21:12:34 +01:00
Igor Moura 40aa2533c9 Added: simple wrapper to extract text from pdf (#330)
Fixes #327
2019-11-07 07:54:10 +01:00
Martin Hasoň ed1b09c6f2 Fix debug logging for pdf2txt.py and dumppdf.py (#325)
Fixes #313
2019-11-06 21:47:19 +01:00
Pieter Marsman 33b16b3f07
Deprecate the use of _py2_no_more_posargs (#328)
Fixes #324
2019-11-02 10:29:39 +01:00
Jianfeng 44b223cf0a Speedup grouping of textboxes (#315)
Changed: using a heap instead of a SortedList and avoid rebuilding the heap in each iteration
Changed: avoid potentially huge number of variable assignments in list comprehension.
Changed: avoid repeatly evaluating `obj is obj` in list comprehension by storing id(obj).
2019-10-31 09:22:58 +01:00
Pieter Marsman d88d6020a2
Remove webapp and other (un)helpful application references: django, cgi, and pyinstaller. (#320)
Fixes #314 
Fixes #105
2019-10-26 19:16:37 +02:00
Pieter Marsman a238a19999
Fix assertionerror when dumping pdf with reference to objid 0 (#318)
Fixes #94 
Added: test to get check if `PDFObjectNotFound` error is raised if objid 0 is requested.
2019-10-25 22:49:58 +02:00
Serj Sintsov cb9cd8ea46 Use named logger instead of root logger (#236) 2019-10-22 20:52:43 +02:00
Pieter Marsman 373c6e7b97
Added: extraction of JBIG2 encoded images (#311)
And added test for pdf with JBIG2 image.

Fixes #26 
Closes #46
2019-10-22 17:37:06 +02:00
Pieter Marsman 694aa508c3 Release 20191020 2019-10-20 14:21:48 +02:00
Pieter Marsman adc4726e06
Add warning about dropping python2 support (#307)
Fix #303
2019-10-20 13:59:29 +02:00
Pieter Marsman 9fd7172f7b Cleanup utils.py 2019-10-17 12:14:02 +02:00
jet457 7e40fde320 Removing assertion in drange to allow equal inputs (#246) and mimic behaviour of built-in method range
Fixes #66, since it now allows the bbox to have 0 width or 0 height
Added tests for Plane since it is the API that uses drange
2019-10-17 12:04:25 +02:00
D.A.Bashkirtsev 4df6d4e5ca Changed: comparations for image colorspace literals (#132)
Fixes #131 

Changed: comparations for image colorspace literals
Added: test for extracting images from pdfs
2019-10-15 16:11:54 +02:00
Pieter Marsman 63b2e09ac3
Merge pull request #203 from jbarlow83/negative-descent
Interpret font Descent as a negative number even if specified as positive
2019-10-13 20:06:52 +02:00
Tony Tong 106a09c5bb fix stoke color and non-stroke color in PDFGraphicState 2019-10-12 17:35:46 -04:00
Tata Ganesh f218996fe9
Merge pull request #273 from igormp/develop
Use resolve_all on PdfFont widths and bbox
2019-10-12 21:24:29 +05:30
Fakabbir Amin 7c03d96d25 Corrects Comment 2019-08-20 17:16:10 +05:30
Fakabbir Amin abd685fdc6 Corrects Code Comment 2019-08-20 17:13:27 +05:30
Fakabbir Amin 3d549ea48c Removes code comments 2019-08-20 16:48:40 +05:30
Igor Moura cf4641d877
Merge branch 'develop' into develop 2019-08-15 08:11:28 -03:00
Fakabbir Amin fe38695739
Merge branch 'develop' into pdfstream-as-cmap 2019-08-10 10:44:31 +05:30
Fakabbir Amin 5a0d8db052 Adds decoder for OnebyteIdentityH/V instead of using default CMap 2019-08-10 10:07:23 +05:30
Tata Ganesh 42e2c8143b
Merge pull request #263 from pietermarsman/261-glyph-list-specification
name2unicode() should follow the Adobe Glyph List Specification
2019-07-26 22:13:34 +05:30
Igor Moura 2f4518231f Use resolve_all on PdfFont widths and bbox
Fixes #268
2019-07-24 15:10:13 -03:00
Igor Moura 540df9f676 Replaced .iteritems() and with six.iteritems() for Python 3 compat
This is a squashed commit, the previous messages can be seen bellow

This is the 1st commit message:

Replaced .iteritems() usage for .items()

Fixed some python 2 leftovers, as discussed in #267. Also formatted code according to Black.\nThis possibly breaks some python 2 compatibility

This is the commit message #2:

Reverted formatting and more spread six usage
2019-07-24 14:08:30 -03:00
Fakabbir Amin f1a4dcea88 Adds Test Cases, Neater Code For CMap Assignment 2019-07-24 11:56:06 +05:30
Fakabbir Amin fa400431f5 Adds Test, Removes Unnecessary Assumptions 2019-07-17 11:38:00 +05:30
Pieter Marsman 6f362f53fe Raise a `KeyError` with a useful message if `unicode2name()` does not match any glyph name. Use this message to log debug statements. 2019-07-16 08:52:24 +02:00
Pieter Marsman 0fb83366b6 Remove intermediate variable `full_stop` because it is just a dot 2019-07-16 08:49:57 +02:00
Fakabbir Amin cc40af3d2b Removes @property, Adds docstring 2019-07-15 14:21:21 +05:30
Pieter Marsman c597e95a9f Use KeyError to signal that the name does not resemble any unicode, this pattern is also used in the rest of pdfminer.six 2019-07-14 15:37:15 +02:00
Pieter Marsman 33cc9861ae Add docstring to Type1FontHeaderParser.get_encoding() that describes that the custom CharStrings of the font are mapped to '' 2019-07-14 15:19:17 +02:00
Pieter Marsman f0392f8049 Change implementation of name2unicode such that it follows the Adobe Glyph specs (with allowing lowercase) 2019-07-14 15:16:42 +02:00
Fakabbir Amin 8e4a82ad8b Corrects Indentation 2019-07-13 05:00:25 +05:30
Fakabbir Amin c022358c8d Encapsulates character map name 2019-07-13 04:52:24 +05:30
John Kesegich 8ab2e287be Handle PDFStream as character map name in PDFCIDFont 2019-02-25 11:42:30 -06:00
ganeshtata b6a5848208 FEAT: Release 20181108 2018-11-08 22:37:11 +05:30
Tata Ganesh e03ecab856
Merge pull request #141 from timb07/speedup_layout
Speed up layout of text boxes
2018-11-08 20:28:40 +05:30
James R. Barlow 2ede124142 Interpet font Descent as a negative number even if specified as positive
The PDF RM specifies that Descent should be negative. Fonts that claim
to have a positive Descent (not that it would make sense) always seem
to be wrong about this claim.
2018-11-03 23:17:48 -07:00
Tata Ganesh 259b29299e
Merge pull request #133 from timb07/speedup
Speed up handling of PDFs with large images
2018-07-15 11:27:35 +05:30
Martin Wolf edaf2c9e3f move unittest to main() 2018-06-26 00:51:51 +02:00
Martin Wolf eff3f19886 Merge remote-tracking branch 'upstream/master' 2018-06-25 23:32:52 +02:00
Tata Ganesh 9c7bdcc716
Merge pull request #157 from h2ri/master
decode cid: 160 and 173 to spaces
2018-06-25 11:19:27 +05:30
Charles Reid 7b08cdbff9 apply dos2unix to files in pdfminer/ and tools/ to remove \r\n windows line endings 2018-06-21 12:19:48 -07:00
Goulu 1db260609e
render_string must have 5 params in all PDFDevice classes (#158) 2018-06-21 10:21:26 +02:00
Guglielmetti Philippe 70624a64dd render_string() now takes 3 parameters, not 5 (reverted from commit 95b65536af) 2018-06-21 09:49:45 +02:00