Tata Ganesh
f218996fe9
Merge pull request #273 from igormp/develop
...
Use resolve_all on PdfFont widths and bbox
2019-10-12 21:24:29 +05:30
Fakabbir Amin
7c03d96d25
Corrects Comment
2019-08-20 17:16:10 +05:30
Fakabbir Amin
abd685fdc6
Corrects Code Comment
2019-08-20 17:13:27 +05:30
Fakabbir Amin
3d549ea48c
Removes code comments
2019-08-20 16:48:40 +05:30
Igor Moura
cf4641d877
Merge branch 'develop' into develop
2019-08-15 08:11:28 -03:00
Fakabbir Amin
fe38695739
Merge branch 'develop' into pdfstream-as-cmap
2019-08-10 10:44:31 +05:30
Fakabbir Amin
5a0d8db052
Adds decoder for OnebyteIdentityH/V instead of using default CMap
2019-08-10 10:07:23 +05:30
Tata Ganesh
42e2c8143b
Merge pull request #263 from pietermarsman/261-glyph-list-specification
...
name2unicode() should follow the Adobe Glyph List Specification
2019-07-26 22:13:34 +05:30
Igor Moura
2f4518231f
Use resolve_all on PdfFont widths and bbox
...
Fixes #268
2019-07-24 15:10:13 -03:00
Igor Moura
540df9f676
Replaced .iteritems() and with six.iteritems() for Python 3 compat
...
This is a squashed commit, the previous messages can be seen bellow
This is the 1st commit message:
Replaced .iteritems() usage for .items()
Fixed some python 2 leftovers, as discussed in #267 . Also formatted code according to Black.\nThis possibly breaks some python 2 compatibility
This is the commit message #2 :
Reverted formatting and more spread six usage
2019-07-24 14:08:30 -03:00
Fakabbir Amin
f1a4dcea88
Adds Test Cases, Neater Code For CMap Assignment
2019-07-24 11:56:06 +05:30
Fakabbir Amin
fa400431f5
Adds Test, Removes Unnecessary Assumptions
2019-07-17 11:38:00 +05:30
Pieter Marsman
6f362f53fe
Raise a `KeyError` with a useful message if `unicode2name()` does not match any glyph name. Use this message to log debug statements.
2019-07-16 08:52:24 +02:00
Pieter Marsman
0fb83366b6
Remove intermediate variable `full_stop` because it is just a dot
2019-07-16 08:49:57 +02:00
Fakabbir Amin
cc40af3d2b
Removes @property, Adds docstring
2019-07-15 14:21:21 +05:30
Pieter Marsman
c597e95a9f
Use KeyError to signal that the name does not resemble any unicode, this pattern is also used in the rest of pdfminer.six
2019-07-14 15:37:15 +02:00
Pieter Marsman
33cc9861ae
Add docstring to Type1FontHeaderParser.get_encoding() that describes that the custom CharStrings of the font are mapped to ''
2019-07-14 15:19:17 +02:00
Pieter Marsman
f0392f8049
Change implementation of name2unicode such that it follows the Adobe Glyph specs (with allowing lowercase)
2019-07-14 15:16:42 +02:00
Fakabbir Amin
8e4a82ad8b
Corrects Indentation
2019-07-13 05:00:25 +05:30
Fakabbir Amin
c022358c8d
Encapsulates character map name
2019-07-13 04:52:24 +05:30
John Kesegich
8ab2e287be
Handle PDFStream as character map name in PDFCIDFont
2019-02-25 11:42:30 -06:00
ganeshtata
b6a5848208
FEAT: Release 20181108
2018-11-08 22:37:11 +05:30
Tata Ganesh
e03ecab856
Merge pull request #141 from timb07/speedup_layout
...
Speed up layout of text boxes
2018-11-08 20:28:40 +05:30
James R. Barlow
2ede124142
Interpet font Descent as a negative number even if specified as positive
...
The PDF RM specifies that Descent should be negative. Fonts that claim
to have a positive Descent (not that it would make sense) always seem
to be wrong about this claim.
2018-11-03 23:17:48 -07:00
Tata Ganesh
259b29299e
Merge pull request #133 from timb07/speedup
...
Speed up handling of PDFs with large images
2018-07-15 11:27:35 +05:30
Martin Wolf
edaf2c9e3f
move unittest to main()
2018-06-26 00:51:51 +02:00
Martin Wolf
eff3f19886
Merge remote-tracking branch 'upstream/master'
2018-06-25 23:32:52 +02:00
Tata Ganesh
9c7bdcc716
Merge pull request #157 from h2ri/master
...
decode cid: 160 and 173 to spaces
2018-06-25 11:19:27 +05:30
Charles Reid
7b08cdbff9
apply dos2unix to files in pdfminer/ and tools/ to remove \r\n windows line endings
2018-06-21 12:19:48 -07:00
Goulu
1db260609e
render_string must have 5 params in all PDFDevice classes ( #158 )
2018-06-21 10:21:26 +02:00
Guglielmetti Philippe
70624a64dd
render_string() now takes 3 parameters, not 5 (reverted from commit 95b65536af
)
2018-06-21 09:49:45 +02:00
Guglielmetti Philippe
95b65536af
render_string() now takes 3 parameters, not 5
2018-06-21 09:28:55 +02:00
Healthi
65eb0cef82
decode cid: 160 and 170 to spaces
2018-06-20 17:17:03 +05:30
Martin Wolf
26f80715ed
Merge remote-tracking branch 'upstream/master'
2018-06-20 13:27:18 +02:00
Tata Ganesh
67bc581bd3
Merge pull request #134 from timb07/issue_90
...
FIX: TypeError caused by bug in _parse_comment; #90 #89 #109
2018-06-14 09:27:34 +05:30
Tata Ganesh
7084d81bd1
Merge pull request #129 from clustree/xml-color
...
FEAT: Send color to XML conversion
2018-06-10 21:02:34 +05:30
Martin Wolf
4bdb3ba8cc
Fixes needed to be able to compile pdfminer.six with Cython
2018-04-12 00:05:38 +02:00
Tim Bell
1cbeaebfce
Fix Python 2.6 incompatibility
2018-04-11 10:34:15 +10:00
Tim Bell
0c8cf748fe
Fix copy-paste error
2018-04-11 10:15:32 +10:00
Tim Bell
8f8a78bb88
Remove now-unused csort()
2018-04-11 09:37:32 +10:00
Tim Bell
2dda2b12b4
Speedup layout with .sort() and sortedcontainers.SortedListWithKey()
2018-04-11 09:03:32 +10:00
Gregory Mori
335c25c045
only check for bytes input to enc() in python3
...
In python2, isinstance("", bytes) is true, causing enc() to
suppress any string input. This results in fontnames being lost
when running pdf2txt.py in python2.
As this check was not present in the original python2 version of
pdfminer, restrict it to only check when running in python3.
2018-04-09 12:21:59 -07:00
Tim Bell
981e3a575e
Fix TypeError caused by bug in _parse_comment; #90 #89 #109
2018-04-03 12:47:40 +10:00
Tim Bell
083f11b165
Fix cases where a bytearray doesn't work in place of bytes
2018-04-03 07:27:29 +10:00
Tim Bell
185ddeb2ab
Speed up handling of PDFs with large images with more minimal change
2018-04-03 07:21:21 +10:00
Tim Bell
fab1c9462c
Speed up handling of PDFs with large images
2018-03-29 14:21:31 +11:00
Tata Ganesh
eddf861fbd
Merge pull request #125 from yosida95/bytes-type
...
Fix type of an argument to PDFFont#decode to bytes in py3
2018-03-19 11:00:10 +05:30
Quentin Pradet
0911703eba
pdfcolor: Fix Python 2.6 compatibility
2018-03-06 14:53:11 +04:00
Quentin Pradet
94f3d61bb2
converter: Fix XML syntax
2018-03-06 14:41:52 +04:00
Quentin Pradet
2231f0892e
Send non-stroke color to XML conversion
...
Inspired by https://github.com/euske/pdfminer/pull/158 from @andruo11
and https://github.com/euske/pdfminer/pull/197 from @staccatosound.
2018-03-06 14:11:48 +04:00
Quentin Pradet
b6c63bedc6
Make DeviceGray the default color as it should be
2018-03-06 11:24:07 +04:00
Quentin Pradet
0ce9a29f83
Fix colorspace determinism with OrderedDict
2018-03-06 11:23:32 +04:00
Kohei YOSHIDA
a636cbcfd4
fix type of an argument to PDFFont#decode to bytes in py3
2018-02-20 13:42:09 +09:00
KOLANICH
3bf3c97bbb
Added a vector between 2 boxes which may be useful for users of the library
2018-02-16 14:49:12 +00:00
Tata Ganesh
3e6cc20cb2
Merge pull request #96 from sschuberth/patch-1
...
TrueTypeFont: Check for enough data to unpack
2018-01-31 18:26:54 +05:30
ganeshtata
1b88575e79
FIX: Null character replaced by blank
...
-The presence of the character '\0' was causing an error with some PDFs.
-It has been fixed by replacing all occurences of '\0' with ''.
2017-11-08 12:50:50 +05:30
Sebastian Schuberth
fcd3e6ce00
Catch an error unpack might throw instead of checking the length before
2017-10-30 19:31:58 +01:00
Sebastian Schuberth
39428fb4f0
TrueTypeFont: Check for enough data to unpack
...
Fixes https://github.com/euske/pdfminer/issues/96
and https://github.com/euske/pdfminer/issues/144 .
2017-10-16 12:35:04 +02:00
SUZUKI Masaya
d4118cf5e8
Enabled PDFDevice in the with statement ( #88 )
2017-08-18 08:15:04 +02:00
Peter Bittner
e39800f14c
Move package description into package docstring ( #87 )
...
Convert Windows/DOS line endings CR/LF to Unix LF (again!)
Add Python 3.6 to classifiers, update project URL
2017-08-18 08:13:15 +02:00
Venelin Stoykov
171cdcc69d
Microoptimization for singlebyte fonts ( #84 )
...
Instead of list comprehension which will call a function to get the integer value of the bytes directly convert it to bytearray which is more optimal structure for storing list of bytes.
2017-08-18 08:10:27 +02:00
Venelin Stoykov
14de393d5e
Cleanup psparser ( #83 )
...
- Do not use bytesindex function. Use native slices instead
- Fix import ordering
2017-08-18 08:10:06 +02:00
Venelin Stoykov
496bfd0778
Remove leftover from removing shebangs ( #81 )
2017-08-18 08:09:00 +02:00
Venelin Stoykov
c2432c32f1
Fix assert message for PDFLayoutAnalyzer.end_page ( #80 )
...
stack is undefined
2017-08-18 08:08:08 +02:00
Philippe Guglielmetti
4c604828e8
v. 20170720
2017-07-20 21:35:49 +02:00
Philippe Guglielmetti
b010db6049
solves https://github.com/pdfminer/pdfminer.six/issues/65
2017-07-20 21:17:06 +02:00
Sergei Maertens
67bf5ab124
Compare byte with byte instead of int ( #78 )
2017-07-20 20:47:14 +02:00
Sergei Maertens
3e364354da
Fixes #64 -- be less strict when inspecting a tree type ( #76 )
...
In the PDFStream it's possible that the /Type element is not
present, but /type is. According to the spec, these are different
elements, but in the case in point they had the same meaning.
If PDFMiner is not running in STRICT mode and /Type doesn't resolve,
a fallback to /type is used to determine the tree type.
2017-07-20 20:46:35 +02:00
Attila Szász
938419c476
Align dumppdf tool to modified data structures. ( #73 )
...
* Align dumppdf tool to modified data structures.
TOC page numbers should also work now, counting from 1.
* Update version number.
2017-07-20 20:46:11 +02:00
Sergei Maertens
d79612c455
Resolve unresolved PDFObjectRefs ( #70 )
...
Thank you !
2017-06-02 13:35:12 +02:00
Hugh Secker-Walker
488545ddc7
Add string expressions to asserts showing local data ( #67 )
2017-05-29 09:06:09 +02:00
Michał Pasternak
fe21725f07
Please replace pycrypto with pycryptodome ( #63 )
...
* Enable 3.6 and replace pycrypto with cryptodome
* Upgrade version number
2017-05-29 09:04:38 +02:00
Anton Oleynick
4bc0a0c105
Update pdftypes.py ( #61 )
...
Fix errors with:
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 850, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 860, in render_contents
self.init_resources(resources)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 360, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 210, in get_font
font = self.get_font(None, subspec)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 201, in get_font
font = PDFCIDFont(self, spec)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdffont.py", line 667, in __init__
BytesIO(self.fontfile.get_data()))
File "/app/python/lib/python3.5/site-packages/pdfminer/pdftypes.py", line 297, in get_data
self.decode()
File "/app/python/lib/python3.5/site-packages/pdfminer/pdftypes.py", line 278, in decode
if 'Predictor' in params:
TypeError: argument of type 'NoneType' is not iterable
2017-05-29 08:55:02 +02:00
Philippe Guglielmetti
baddb25df6
v 20170419 (patches a stupid bug from yesterday...)
2017-04-19 14:24:13 +02:00
Philippe Guglielmetti
82af7f0aac
issue #56 reproduced, solution attempt unsucessful
2017-04-19 14:19:14 +02:00
Philippe Guglielmetti
cd92883925
logging (stupid bug)
2017-04-19 13:48:45 +02:00
Philippe Guglielmetti
11a4c8b6c1
version 20170418
2017-04-18 19:13:20 +02:00
Philippe Guglielmetti
7055862eaf
solves https://github.com/pdfminer/pdfminer.six/issues/50
2017-04-18 18:20:31 +02:00
Sergei Maertens
f2b0650ad5
Fixes #54 -- don't pass bytestrings through ord() ( #55 )
2017-04-18 16:57:53 +02:00
Andrew Baumann
9439a3a31a
Miscellaneous bug fixes ( #47 )
...
* utils.decode_text: fix "TypeError: ord() expected string of length 1, but int found"
fixes https://github.com/goulu/pdfminer/issues/24
* pdfinterp.execute: don't assume that every keyword name can be decoded as utf-8
fixes "'str' does not support the buffer interface", https://github.com/goulu/pdfminer/issues/23
* default settings.STRICT to False, for compatibility with the original pdfminer
* PDFCIDFont: handle font registry/orderings that may be PDFObjRefs
* utils.nunpack: handle 8-byte integers
2017-02-06 14:57:01 +01:00
Philippe Guglielmetti
9b9d69aee9
image export works again with Py3 (issue #15 )
...
https://github.com/pdfminer/pdfminer.six/issues/15
2017-01-20 10:11:19 +01:00
Philippe Guglielmetti
f094f0b380
v. 20170119 RC
2017-01-19 08:42:20 +01:00
Philippe Guglielmetti
52feb22eeb
Merge remote-tracking branch 'origin/master'
...
Conflicts:
MANIFEST.in
README.md
pdfminer/latin_enc.py
pdfminer/pdfdocument.py
pdfminer/pdfinterp.py
pdfminer/pdfpage.py
pdfminer/pdftypes.py
pdfminer/psparser.py
pdfminer/utils.py
samples/Makefile
setup.py
2017-01-19 08:03:16 +01:00
Jin-tae Hwang
61d423d81c
bugfix: if fontname is bytes then skip ( #43 )
2016-12-14 17:34:16 +01:00
Gabriel Augendre
6cc4abbaa8
Fix import of Django settings ( #41 )
...
Settings in Django are imported as such, see https://docs.djangoproject.com/en/1.10/topics/settings/#using-settings-in-python-code
2016-11-26 20:26:23 +01:00
Humberto Pereira
e6ad15af79
Added painting information ( #37 )
...
* added color support to stroking and non stroking color spaces
* extended LTCurve, LTLine and LTRect to save painting information
* modified PDFLayoutAnalyzer to populate the shapes with painting information
2016-11-08 20:01:58 +01:00
Antonio Ercole De Luca
0fdebc6739
Removing all the "#!/usr/bin/env python" lines, they do not need for … ( #34 )
...
* Removing all the "#!/usr/bin/env python" lines, they do not need for python3, solving issue number: #19 .
* Restored all the shebangs in the tools and tests folders (because they are real executables) but used "#!/usr/bin/env python" instead of "#!/usr/bin/python" as this blog points out: https://www.peterbe.com/plog/importance-of-env
Removed also the shebang from pdfminer/psparser.py file.
2016-11-08 20:01:11 +01:00
Yusuke Shinyama
8150458718
Added: a simpler ordering mode when 1<F.
2016-09-26 18:06:34 +09:00
Friedrich Lindenberg
447adcf02f
fix STRICT reference
2016-09-24 12:03:22 +02:00
Friedrich Lindenberg
70918095cc
Return an empty list when no `Differences` are found.
2016-09-24 11:57:11 +02:00
Friedrich Lindenberg
865246bd0c
fix print, upstream: 0112112458
2016-09-23 15:04:07 +02:00
Friedrich Lindenberg
0cb13983f7
Backport LICENSE.
2016-09-23 14:57:28 +02:00
Friedrich Lindenberg
1820f96481
backport changes for upstream: #145 , #95 , #111 , #117 , #129 , #132 .
2016-09-23 14:31:31 +02:00
Jakub Wilk
5ddbecb551
Fix typos
2016-09-13 16:25:09 +02:00
Yusuke Shinyama
3068dcdb4a
Merge pull request #145 from vinayak-mehta/glyphlist_link
...
Replace old Adobe glyphlist link
2016-09-12 00:18:24 +09:00
Yusuke Shinyama
c753dbac4c
Merge pull request #117 from native-api/png_pred_errors
...
make ValueError's descriptive
2016-09-11 23:55:34 +09:00
Yusuke Shinyama
f1dd9ea6d2
Merge pull request #129 from lucanaso/lucanaso-patch-1
...
Fixed for rendering non breaking spaces (cid:160)
2016-09-11 23:53:03 +09:00
Yusuke Shinyama
177a4ab937
Fixed : #132 (PDFStream.get_filters: support multiple parameterless filters)
2016-09-11 23:52:13 +09:00
Yusuke Shinyama
e95a483790
Merge pull request #134 from speedplane/feature/Fix-Get-Filters
...
Fix Bug with PDF Stream Decoder
2016-09-11 23:48:42 +09:00
Yusuke Shinyama
64fe538b24
Fixed : #114 (UnicodeEncodeError in PSLiteral)
2016-09-11 23:43:22 +09:00