Quentin Pradet
2231f0892e
Send non-stroke color to XML conversion
...
Inspired by https://github.com/euske/pdfminer/pull/158 from @andruo11
and https://github.com/euske/pdfminer/pull/197 from @staccatosound.
2018-03-06 14:11:48 +04:00
Quentin Pradet
b6c63bedc6
Make DeviceGray the default color as it should be
2018-03-06 11:24:07 +04:00
Quentin Pradet
0ce9a29f83
Fix colorspace determinism with OrderedDict
2018-03-06 11:23:32 +04:00
Kohei YOSHIDA
a636cbcfd4
fix type of an argument to PDFFont#decode to bytes in py3
2018-02-20 13:42:09 +09:00
KOLANICH
3bf3c97bbb
Added a vector between 2 boxes which may be useful for users of the library
2018-02-16 14:49:12 +00:00
Tata Ganesh
3e6cc20cb2
Merge pull request #96 from sschuberth/patch-1
...
TrueTypeFont: Check for enough data to unpack
2018-01-31 18:26:54 +05:30
ganeshtata
1b88575e79
FIX: Null character replaced by blank
...
-The presence of the character '\0' was causing an error with some PDFs.
-It has been fixed by replacing all occurences of '\0' with ''.
2017-11-08 12:50:50 +05:30
Sebastian Schuberth
fcd3e6ce00
Catch an error unpack might throw instead of checking the length before
2017-10-30 19:31:58 +01:00
Sebastian Schuberth
39428fb4f0
TrueTypeFont: Check for enough data to unpack
...
Fixes https://github.com/euske/pdfminer/issues/96
and https://github.com/euske/pdfminer/issues/144 .
2017-10-16 12:35:04 +02:00
SUZUKI Masaya
d4118cf5e8
Enabled PDFDevice in the with statement ( #88 )
2017-08-18 08:15:04 +02:00
Peter Bittner
e39800f14c
Move package description into package docstring ( #87 )
...
Convert Windows/DOS line endings CR/LF to Unix LF (again!)
Add Python 3.6 to classifiers, update project URL
2017-08-18 08:13:15 +02:00
Venelin Stoykov
171cdcc69d
Microoptimization for singlebyte fonts ( #84 )
...
Instead of list comprehension which will call a function to get the integer value of the bytes directly convert it to bytearray which is more optimal structure for storing list of bytes.
2017-08-18 08:10:27 +02:00
Venelin Stoykov
14de393d5e
Cleanup psparser ( #83 )
...
- Do not use bytesindex function. Use native slices instead
- Fix import ordering
2017-08-18 08:10:06 +02:00
Venelin Stoykov
496bfd0778
Remove leftover from removing shebangs ( #81 )
2017-08-18 08:09:00 +02:00
Venelin Stoykov
c2432c32f1
Fix assert message for PDFLayoutAnalyzer.end_page ( #80 )
...
stack is undefined
2017-08-18 08:08:08 +02:00
Philippe Guglielmetti
4c604828e8
v. 20170720
2017-07-20 21:35:49 +02:00
Philippe Guglielmetti
b010db6049
solves https://github.com/pdfminer/pdfminer.six/issues/65
2017-07-20 21:17:06 +02:00
Sergei Maertens
67bf5ab124
Compare byte with byte instead of int ( #78 )
2017-07-20 20:47:14 +02:00
Sergei Maertens
3e364354da
Fixes #64 -- be less strict when inspecting a tree type ( #76 )
...
In the PDFStream it's possible that the /Type element is not
present, but /type is. According to the spec, these are different
elements, but in the case in point they had the same meaning.
If PDFMiner is not running in STRICT mode and /Type doesn't resolve,
a fallback to /type is used to determine the tree type.
2017-07-20 20:46:35 +02:00
Attila Szász
938419c476
Align dumppdf tool to modified data structures. ( #73 )
...
* Align dumppdf tool to modified data structures.
TOC page numbers should also work now, counting from 1.
* Update version number.
2017-07-20 20:46:11 +02:00
Sergei Maertens
d79612c455
Resolve unresolved PDFObjectRefs ( #70 )
...
Thank you !
2017-06-02 13:35:12 +02:00
Hugh Secker-Walker
488545ddc7
Add string expressions to asserts showing local data ( #67 )
2017-05-29 09:06:09 +02:00
Michał Pasternak
fe21725f07
Please replace pycrypto with pycryptodome ( #63 )
...
* Enable 3.6 and replace pycrypto with cryptodome
* Upgrade version number
2017-05-29 09:04:38 +02:00
Anton Oleynick
4bc0a0c105
Update pdftypes.py ( #61 )
...
Fix errors with:
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 850, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 860, in render_contents
self.init_resources(resources)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 360, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 210, in get_font
font = self.get_font(None, subspec)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdfinterp.py", line 201, in get_font
font = PDFCIDFont(self, spec)
File "/app/python/lib/python3.5/site-packages/pdfminer/pdffont.py", line 667, in __init__
BytesIO(self.fontfile.get_data()))
File "/app/python/lib/python3.5/site-packages/pdfminer/pdftypes.py", line 297, in get_data
self.decode()
File "/app/python/lib/python3.5/site-packages/pdfminer/pdftypes.py", line 278, in decode
if 'Predictor' in params:
TypeError: argument of type 'NoneType' is not iterable
2017-05-29 08:55:02 +02:00
Philippe Guglielmetti
baddb25df6
v 20170419 (patches a stupid bug from yesterday...)
2017-04-19 14:24:13 +02:00
Philippe Guglielmetti
82af7f0aac
issue #56 reproduced, solution attempt unsucessful
2017-04-19 14:19:14 +02:00
Philippe Guglielmetti
cd92883925
logging (stupid bug)
2017-04-19 13:48:45 +02:00
Philippe Guglielmetti
11a4c8b6c1
version 20170418
2017-04-18 19:13:20 +02:00
Philippe Guglielmetti
7055862eaf
solves https://github.com/pdfminer/pdfminer.six/issues/50
2017-04-18 18:20:31 +02:00
Sergei Maertens
f2b0650ad5
Fixes #54 -- don't pass bytestrings through ord() ( #55 )
2017-04-18 16:57:53 +02:00
Andrew Baumann
9439a3a31a
Miscellaneous bug fixes ( #47 )
...
* utils.decode_text: fix "TypeError: ord() expected string of length 1, but int found"
fixes https://github.com/goulu/pdfminer/issues/24
* pdfinterp.execute: don't assume that every keyword name can be decoded as utf-8
fixes "'str' does not support the buffer interface", https://github.com/goulu/pdfminer/issues/23
* default settings.STRICT to False, for compatibility with the original pdfminer
* PDFCIDFont: handle font registry/orderings that may be PDFObjRefs
* utils.nunpack: handle 8-byte integers
2017-02-06 14:57:01 +01:00
Philippe Guglielmetti
9b9d69aee9
image export works again with Py3 (issue #15 )
...
https://github.com/pdfminer/pdfminer.six/issues/15
2017-01-20 10:11:19 +01:00
Philippe Guglielmetti
f094f0b380
v. 20170119 RC
2017-01-19 08:42:20 +01:00
Philippe Guglielmetti
52feb22eeb
Merge remote-tracking branch 'origin/master'
...
Conflicts:
MANIFEST.in
README.md
pdfminer/latin_enc.py
pdfminer/pdfdocument.py
pdfminer/pdfinterp.py
pdfminer/pdfpage.py
pdfminer/pdftypes.py
pdfminer/psparser.py
pdfminer/utils.py
samples/Makefile
setup.py
2017-01-19 08:03:16 +01:00
Jin-tae Hwang
61d423d81c
bugfix: if fontname is bytes then skip ( #43 )
2016-12-14 17:34:16 +01:00
Gabriel Augendre
6cc4abbaa8
Fix import of Django settings ( #41 )
...
Settings in Django are imported as such, see https://docs.djangoproject.com/en/1.10/topics/settings/#using-settings-in-python-code
2016-11-26 20:26:23 +01:00
Humberto Pereira
e6ad15af79
Added painting information ( #37 )
...
* added color support to stroking and non stroking color spaces
* extended LTCurve, LTLine and LTRect to save painting information
* modified PDFLayoutAnalyzer to populate the shapes with painting information
2016-11-08 20:01:58 +01:00
Antonio Ercole De Luca
0fdebc6739
Removing all the "#!/usr/bin/env python" lines, they do not need for … ( #34 )
...
* Removing all the "#!/usr/bin/env python" lines, they do not need for python3, solving issue number: #19 .
* Restored all the shebangs in the tools and tests folders (because they are real executables) but used "#!/usr/bin/env python" instead of "#!/usr/bin/python" as this blog points out: https://www.peterbe.com/plog/importance-of-env
Removed also the shebang from pdfminer/psparser.py file.
2016-11-08 20:01:11 +01:00
Yusuke Shinyama
8150458718
Added: a simpler ordering mode when 1<F.
2016-09-26 18:06:34 +09:00
Friedrich Lindenberg
447adcf02f
fix STRICT reference
2016-09-24 12:03:22 +02:00
Friedrich Lindenberg
70918095cc
Return an empty list when no `Differences` are found.
2016-09-24 11:57:11 +02:00
Friedrich Lindenberg
865246bd0c
fix print, upstream: 0112112458
2016-09-23 15:04:07 +02:00
Friedrich Lindenberg
0cb13983f7
Backport LICENSE.
2016-09-23 14:57:28 +02:00
Friedrich Lindenberg
1820f96481
backport changes for upstream: #145 , #95 , #111 , #117 , #129 , #132 .
2016-09-23 14:31:31 +02:00
Jakub Wilk
5ddbecb551
Fix typos
2016-09-13 16:25:09 +02:00
Yusuke Shinyama
3068dcdb4a
Merge pull request #145 from vinayak-mehta/glyphlist_link
...
Replace old Adobe glyphlist link
2016-09-12 00:18:24 +09:00
Yusuke Shinyama
c753dbac4c
Merge pull request #117 from native-api/png_pred_errors
...
make ValueError's descriptive
2016-09-11 23:55:34 +09:00
Yusuke Shinyama
f1dd9ea6d2
Merge pull request #129 from lucanaso/lucanaso-patch-1
...
Fixed for rendering non breaking spaces (cid:160)
2016-09-11 23:53:03 +09:00
Yusuke Shinyama
177a4ab937
Fixed : #132 (PDFStream.get_filters: support multiple parameterless filters)
2016-09-11 23:52:13 +09:00
Yusuke Shinyama
e95a483790
Merge pull request #134 from speedplane/feature/Fix-Get-Filters
...
Fix Bug with PDF Stream Decoder
2016-09-11 23:48:42 +09:00
Yusuke Shinyama
64fe538b24
Fixed : #114 (UnicodeEncodeError in PSLiteral)
2016-09-11 23:43:22 +09:00
Vinayak Mehta
2926002017
Replace old Adobe glyphlist link
2016-09-08 16:34:53 +05:30
Philippe Guglielmetti
881ea17553
v 20160614
2016-06-14 19:02:07 +02:00
speedplane
2049462f6f
Revert changes unrelated to this branch.
2016-06-13 23:42:21 -04:00
speedplane
b0b8818a41
Fix a bug with pdfminer which occurs when two or more filters are applied to a stream, even though no parameters are specified. The code would previously drop all of the streams after the first due to misapplication of the zip function.
2016-06-13 23:35:11 -04:00
Friedrich Lindenberg
1d54ecd31c
Make the logger run in a namespace.
2016-05-20 21:12:05 +02:00
Philippe Guglielmetti
21fd2bbd23
v 20160202 with Py 2.6 & Py 3.5 support
2016-02-02 15:38:51 +01:00
Goulu
5a23fad6fd
Merge pull request #14 from orangain/close-device
...
Close device to write footer of xml/html files
2016-01-18 11:22:35 +01:00
Goulu
2103e5875e
Merge pull request #13 from orangain/include-cmap
...
Include compiled cmap resources to simplify installation for CJK languages
2016-01-18 11:22:08 +01:00
Steve Hair
92c71436b9
Improved settings management
2016-01-10 12:17:38 -05:00
orangain
f8a051adbd
Close device to write footer of xml/html files
2015-12-27 20:57:00 +09:00
orangain
f1d5d681b6
Include compiled cmap resources to simplify installation for CJK languages
...
* Run `make cmap` and `git add pdfminer/cmap`.
* Modify MANIFEST.in not to include cmaprsrc dir in the sdist package.
* Add pdfminer/cmap/README.txt to include license in the sdist package.
* Remove installation guide specific to CJK languages from README.md.
2015-12-27 13:32:29 +09:00
lucanaso
63bb3caec2
Fixed for rendering non breaking spaces (cid:160)
...
As stated in the PDF specification ISO 32000-1, table in Annex D.2 Latin Character Set and Encodings page 653 to 656 (available here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf ):
"The SPACE character shall also be encoded as 312 in MacRomanEncoding and as 240 in WinAnsiEncoding. This duplicate code shall signify a nonbreaking space; it shall be typographically the same as (U+003A) SPACE."
The duplicate key was missing, therefore PDFMiner was returning the string "(cid:160)".
This fix adds the duplicate key in latin_enc.py
glyphlist.py does not need to be modified as it already contains a key for non breaking space https://github.com/lucanaso/pdfminer/blob/master/pdfminer/glyphlist.py#L2755 .
2015-12-09 16:47:32 +01:00
Chris Hager
8149be1669
bugfixes
2015-12-06 00:17:58 +01:00
Chris Hager
2e1be5721f
removed settings.ENFORCE_CHECK_EXTRACTABLE
2015-11-01 22:34:18 +01:00
Chris Hager
b686dd0139
pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE
2015-11-01 22:28:08 +01:00
Ivan Pozdeev
63c9378b8b
make ValueError's descriptive
2015-08-10 03:14:51 +03:00
Alex Zagorodniuk
131cb1ea92
change STRICT to be a settings attribute
2015-06-22 19:08:35 -04:00
Goulu
623bd98452
Update __init__.py
...
version 20150601
2015-06-01 10:21:51 +02:00
Cathal Garvey
403711ed13
Whoops, forgot to version-gate chardet in the actual code. Thanks Travis!
2015-05-30 19:33:35 +01:00
Cathal Garvey
a2ad7a6d03
Fixed some bugs preventing all tests from passing in Py2.
2015-05-30 18:02:29 +01:00
Cathal Garvey
79c97ac221
Docstrings.
2015-05-30 17:16:06 +01:00
Cathal Garvey
3b7edba48c
Forgot to add the actual compartmentalised function..
2015-05-30 17:04:28 +01:00
Cathal Garvey
08cb217983
Progress, progress.. not nearly atomic enough, sorry.
2015-05-30 16:14:24 +01:00
Cathal Garvey
1b47bed306
Many changes to make pdf2txt.py work better in Py3, some in that script, others in module!
...
Sorry, changes should have been more atomic.
*In pdf2txt.py:*
* Re-wrote main function to use argparse instead of optparse.
* Manually tested in Py2/Py3 to get partial consistency.
* Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway.
* Py2 mode *probably* unchanged, cannot find any bugs yet...
* Kept old main function for posterity, for now.
*In utils:*
* Added a few compatibility functions (some string hax required chardet, new dependency):
- make_compat_bytes(in_str)-> (py3->bytes | py2->str)
- make_compat_str(in_str)-> (str)
- compatible_encode_method(bytesorstring, encoding, erraction)-> (str)
*In pdfdevice:*
* To handle different output filetypes in Py3, injected lots of calls to new utils methods,
as well as some six.PYX checks and logic. These changes are largely responsible for
enhanced Py2/Py3 consistency.
*In converter:*
* To handle output filetypes in Py2, injected a few checks and fixes particularly around the
py2 `str.encode` method and its *assumed* usual use-analogies in Py3.
2015-05-17 21:08:57 +01:00
Yusuke Shinyama
14fd0fd2d6
Fixed : #84 (fontname was in unicode)
2015-04-05 19:02:02 +09:00
speedplane
806ee603ff
More fixes to layout. The compute neighbors function for horizontal lines is only intended to find neighbors on differing lines. However, it's entirely possible that horizontal neighbors could appear.
...
This commit finds horizontal neighbors in a horizonal line and merges them together into a single horizontal line if necessary. This leads to much better text extraction if the PDF was created in a funky way.
For example (test case coming), I have seen PDFs which are written almost like vertical columns, but the text is entirely horizontal.
2014-12-12 00:36:59 -05:00
speedplane
45170e7183
There are a number of relatively complex changes here. Comments are in order of where the change appears.
...
1.
When detecting text in a horizontal line, we already add a space between words if separated by more than word_margin apart. However now, we only do it if there is not already an existing space. This prevents multiple spaces being placed between words.
2.
Detect a horizontal line if the line is zero width. This improves our detection of horizonal lines when looking for both horizontal and vertical.
3.
Don't detect a vertical line if the previous letter is whitspace. Prevents double spaces being caught as vert lines.
4.
Improve upon an unfortunate O(N^2) algorithm which I have seen taking many minutes to execute. Unfortunately, while the "fix" reduces algorithmic complexity, it isn't technically correct, so we only do it when we know things will take a long time.
2014-12-12 00:36:59 -05:00
Yusuke Shinyama
0112112458
Fixed: crash on invalid chr number.
2014-12-09 22:55:47 +09:00
enkore
d0379a2c44
Fix utils.decode_text
2014-12-04 17:09:52 +01:00
speedplane
36977fbe08
Add debug flags for much of the debug output.
2014-11-11 23:36:58 -05:00
speedplane
ecc4d05675
Fix a unicode conversion bug.
...
See https://github.com/euske/pdfminer/issues/75
2014-11-11 23:34:33 -05:00
cybjit
515687e1bb
more xrange to range
2014-09-16 23:17:31 +02:00
cybjit
9b2e29396b
apply_png_predictor py3
2014-09-16 22:59:29 +02:00
cybjit
ad05121c69
password py3
2014-09-16 22:59:00 +02:00
cybjit
14585987c3
keep password api unicode, latin1 or utf-8 is encoded in handler
2014-09-16 22:58:25 +02:00
cybjit
2260f77b19
fix dict_value usage in strict mode
2014-09-16 22:57:29 +02:00
cybjit
51a361c145
clean up HTMLConverter and XMLConverter encoding
2014-09-16 22:57:00 +02:00
Goulu
8861d7e0ed
version 20140915 pushed to PyPi as pdfminer_six
2014-09-15 10:33:04 +02:00
cybjit
39942b6642
avoid string formating when not logging
2014-09-12 00:29:31 +02:00
cybjit
01821c7d1e
rename bytes to avoid built-in collision
2014-09-12 00:29:31 +02:00
cybjit
31e6afc7cf
faster and simpler bytes implementation
2014-09-12 00:29:30 +02:00
cybjit
cba5a42ba8
decipher_all bytes
2014-09-12 00:29:30 +02:00
cybjit
6357e2da80
code2cid uses int, not byte
2014-09-12 00:29:27 +02:00
cybjit
9b0a3ee53e
decode cmap font name
2014-09-11 23:30:02 +02:00
cybjit
a6f31a713d
cmap bytes and decode
2014-09-07 18:41:04 +02:00
cybjit
cc733c8217
fixes for ARC4
2014-09-07 18:38:22 +02:00
cybjit
f9a67db89b
change xrange to range
2014-09-07 18:36:12 +02:00
cybjit
0a2d90c051
pdf2txt: do not double encode stdout
2014-09-07 18:34:11 +02:00
unknown
58b8492783
no logging in travis.ci
2014-09-04 10:19:50 +02:00
unknown
1c93468c7e
faster, less verbose tests
2014-09-04 10:02:29 +02:00
unknown
4ab48d1803
Python 3.4 compatibility + tests
2014-09-04 09:36:19 +02:00
unknown
29c07ea770
Python 3.4 support and tests
2014-09-03 15:26:08 +02:00
unknown
a6475b61b4
Python 3.4 support added and tested
2014-09-03 13:17:41 +02:00
unknown
846cd18186
Python 3.4 support
2014-09-02 15:49:46 +02:00
unknown
faea7291a8
tests pass under Py 2.7 and 3.4
2014-09-01 14:16:49 +02:00
Yusuke Shinyama
b0e035c24f
Style fix: always have an explicit return.
2014-07-15 21:38:29 +09:00
Yusuke Shinyama
f5b5e31921
Fixed: DecodeParms array support.
2014-07-09 19:07:27 +09:00
Yusuke Shinyama
137fc3a1ae
Use KWD instead of token.name.
2014-06-30 19:15:21 +09:00
Yusuke Shinyama
1ccfaff411
String-Bytes distinction (first attempt).
2014-06-30 19:05:56 +09:00
Yusuke Shinyama
8791355e1d
Cleanup imports. Use relative imports.
2014-06-26 18:12:39 +09:00
Yusuke Shinyama
2e900e5d10
Fixed for consistent test results. (hopefully...)
2014-06-26 17:41:31 +09:00
Yusuke Shinyama
fe86b4e64e
Changed: StringIO -> io.BytesIO
2014-06-25 19:55:41 +09:00
Yusuke Shinyama
44074b42ea
Added: stripcontrol for XMLConverter (-S option)
2014-06-22 00:33:00 +09:00
Yusuke Shinyama
81391c09f4
Fixed : #56 (with a derpy fix)
2014-06-18 19:11:45 +09:00
Yusuke Shinyama
bb866ae148
Changed: new except syntax (2.6 or above).
2014-06-16 18:50:07 +09:00
Yusuke Shinyama
28e96ba3d0
Use print as a function.
2014-06-15 12:14:33 +09:00
Yusuke Shinyama
0387a6c260
Removed: tuple-unpacking args.
2014-06-15 12:12:13 +09:00
Yusuke Shinyama
a8ec99a848
More autotest tweaks.
2014-06-15 10:52:59 +09:00
Yusuke Shinyama
1384a3fe8d
Code cleanup: removed some debug flags.
2014-06-14 15:43:10 +09:00
Yusuke Shinyama
d9680fca7e
Plane: preserve the object order so that the test result is always consistent.
2014-06-14 14:44:53 +09:00
Yusuke Shinyama
aed248610c
Fixed: dependency on pygame in a unittest.
2014-06-14 12:05:26 +09:00
Yusuke Shinyama
8e14ebf4e1
Use logging module instead of print.
2014-06-14 12:00:49 +09:00
Yusuke Shinyama
8e8e22c095
Fixed a layout bug introduced at c97ec304
.
2014-06-13 23:05:04 +09:00
numion
a4997d6f10
Implement revision 4 and 5 encryption handler.
2014-05-19 16:27:43 +02:00
Michael R. Hines
ae2547b0f2
Stop throwing exception on LITERALS_DCT_DECODE
...
I have PDF documents with images stream and two filters, don't throw exceptions on the second one (DCT).
2014-05-14 13:25:30 +08:00
Yusuke Shinyama
6b6fc264ff
Code refactoring: CMap and UnicodeMap both inherit CMapBase.
2014-04-16 18:57:16 +09:00
Yusuke Shinyama
b09c37902f
Fixed: issue #48 (thanks to speedplane)
2014-04-09 17:55:50 +09:00
Yusuke Shinyama
7b354c7ab3
Version 20140328
2014-03-28 22:49:18 +09:00
Yusuke Shinyama
340387bfc6
Cleanup: isinstance
2014-03-28 17:50:59 +09:00
Yusuke Shinyama
7849c8724a
Fixed: PDFXRefStream.get_objids returns invalid objids.
2014-03-28 17:29:26 +09:00
Yusuke Shinyama
57adad55d7
Revert the wrong fix.
2014-03-28 17:24:03 +09:00
Yusuke Shinyama
b18e8c549d
Version 20140327
2014-03-28 00:19:52 +09:00
Yusuke Shinyama
ee47a6603a
Fixed: issues #45
2014-03-28 00:18:17 +09:00
Yusuke Shinyama
ab03037444
Version 20140324
2014-03-24 21:03:46 +09:00
Yusuke Shinyama
4b2beba398
Code cleanup.
2014-03-24 20:59:24 +09:00
Yusuke Shinyama
f9079e4c0a
Fixed dumppdf.py issues.
2014-03-24 20:55:00 +09:00
Yusuke Shinyama
607be269ab
Applied a patch by Axel Kaiser.
2014-03-24 20:45:35 +09:00
Yusuke Shinyama
d7c4ff28e9
Applied a patch by Axel Kaiser.
2014-03-24 20:39:30 +09:00
Yusuke Shinyama
636d4caeb3
Fixed the PNG predictor bug. Thanks to Gabor Molnar.
2014-03-24 19:57:05 +09:00
Yusuke Shinyama
c97ec3048e
Changed / to // for clarity.
2013-11-26 21:35:16 +09:00
Yusuke Shinyama
b589da51b7
Fix for malformed PDFs.
2013-11-26 21:27:45 +09:00
Yusuke Shinyama
cf1e3c9973
Version bump!
2013-11-13 14:52:01 +09:00
Yusuke Shinyama
acad011e3f
Code cleanup.
2013-11-11 20:46:30 +09:00
Yusuke Shinyama
cbef967fbf
Renamed: LTAnon -> LTAnno
2013-11-11 19:17:45 +09:00
Yusuke Shinyama
c8b6d4112a
Fixed: crash with negative layout bbox.
2013-11-09 15:10:14 +09:00
Yusuke Shinyama
2b56b2eedf
Merged.
2013-11-07 19:50:41 +09:00
Matthew Duggan
2caa5edc25
PEP8: Whitespace changes to match pep8
2013-11-07 17:35:04 +09:00
Matthew Duggan
c1da8b835c
PEP8: Remove trailing whitespace
2013-11-07 16:14:53 +09:00
Matthew Duggan
024b821056
Make pyflakes happy by defining variable
2013-11-07 16:10:14 +09:00
Matthew Duggan
10a68c83bd
Remove unused imports identified by pyflakes
2013-11-07 16:09:44 +09:00
Yusuke Shinyama
4ef81ae9d8
Improved word spacing.
2013-11-05 18:25:19 +09:00
Yusuke Shinyama
02ad086f6a
fixed: HTMLConverter.
2013-10-25 18:10:40 +09:00
Yusuke Shinyama
87842233b3
Version bump!
2013-10-22 22:19:38 +09:00
Yusuke Shinyama
d3730a29ec
API change: process_pdf -> PDFPage.get_pages
2013-10-22 18:59:16 +09:00
Yusuke Shinyama
e927bd307e
fixed: https://github.com/euske/pdfminer/issues/8
2013-10-22 18:24:39 +09:00
Yusuke Shinyama
2aa757978b
Reverted to Python2.x syntax. Fixed LZW decoding.
2013-10-19 08:19:40 +09:00
Yusuke Shinyama
bfd9e93c12
Merge branch 'master' of https://github.com/JordanReiter/pdfminer into JordanReiter-master
2013-10-19 07:46:45 +09:00
Yusuke Shinyama
8e4c0c88e3
fixed: https://github.com/euske/pdfminer/issues/26
2013-10-17 23:20:08 +09:00
Yusuke Shinyama
0ea08890d4
renamed: python2 -> python.
2013-10-17 23:05:27 +09:00
Yusuke Shinyama
8d42eec94d
in_cmap is on by default.
2013-10-17 21:40:43 +09:00
Yusuke Shinyama
de9f9715e3
Added: Adobe-UCS
2013-10-17 21:35:25 +09:00
Yusuke Shinyama
1455f134c6
Fixed: missing ObjStm due to invalid seek.
2013-10-10 20:10:57 +09:00
Yusuke Shinyama
f85c374cae
Separated PDFPage to pdfpage.py.
2013-10-10 19:54:55 +09:00
Yusuke Shinyama
2df67d85ae
Expand ObjStm in XRefFallback.
2013-10-10 19:40:43 +09:00
Yusuke Shinyama
e4bc4e43b1
Code cleanup.
2013-10-10 19:17:58 +09:00
Yusuke Shinyama
cfd60eafbf
Removed PDFDocument.read_xref().
2013-10-10 18:57:08 +09:00
Yusuke Shinyama
658be970b8
Separated PDFXRefFallback.
2013-10-10 18:44:12 +09:00
Yusuke Shinyama
c926874d20
API Change: the PDFDocument cstr now takes PDFParser. set_parser() is removed.
2013-10-10 18:40:06 +09:00
Yusuke Shinyama
557c2c72e6
Removed ObjIdRange for terseness.
2013-10-10 18:34:43 +09:00
Yusuke Shinyama
2221163b94
Split pdfparser.py and pdfdocument.py.
2013-10-10 18:29:30 +09:00
Yusuke Shinyama
1467fc674c
Added fallback for broken PDFs.
2013-10-09 22:45:54 +09:00
Yusuke Shinyama
eabe72ee63
Prevent crash with empty layout box.
2013-10-09 22:13:22 +09:00
Yusuke Shinyama
87143cb36f
Fallback when /Pages does not exist.
2013-10-09 22:08:16 +09:00
Yusuke Shinyama
06425bba00
Introducing PDFObjectNotFound
2013-10-09 21:39:23 +09:00
Yusuke Shinyama
3c3cba2ecc
Moved: import PIL.
2013-04-09 18:42:32 +09:00
Yusuke Shinyama
19e7d70ac1
Merge pull request #15 from jcushman/patch-1
...
2x faster layout analysis: Use set instead of list for Plane's internal collection of objects.
2013-04-09 02:39:46 -07:00
Yusuke Shinyama
4faccff9c9
Merge pull request #16 from jcushman/master
...
2x faster group_textboxes function.
2013-04-09 01:58:56 -07:00
Yusuke Shinyama
d8bc13b3af
Merge pull request #13 from gendoc/master
...
PDFDocument.lookup_name.lookup isn't searching for 'Names' key.
2013-04-09 01:55:54 -07:00
Jordan Reiter
e28b75a462
StringIO
2013-03-27 13:14:58 -04:00
Jordan Reiter
44653071c3
Fixes for LZW error (see https://bitbucket.org/hsoft/pdfminer3k/commits/ae9a4ca0691a/ )
2013-03-27 13:05:29 -04:00
jcushman
f77f196cd3
2x faster group_textboxes function.
2012-06-22 18:11:45 -03:00
jcushman
da3f023b2d
Use set instead of list for Plane's internal collection of objects.
2012-06-22 16:36:33 -03:00
Humberto Pereira
89c81db295
PDFDocument.lookup_names.lookup didn't find 'Names' in some files
2012-03-19 16:42:58 -03:00
Jim Morrison
6413eb7de4
Deal with CMYK images by converting them to RGB. PIL does not invert CMYK images as of PIL 1.1.7, so the invert happens in ImageWriter.
2012-01-24 16:18:36 -08:00
Yusuke Shinyama
c7709045e9
fixed: invalid bmp file output
2011-11-08 00:29:24 +10:00
Yusuke Shinyama
82ff98c7b3
imagewriter now works with text output
2011-11-07 01:15:10 +10:00
Yusuke Shinyama
91174b5665
avoid crash when colorspace is null.
2011-11-06 20:10:48 +10:00
Yusuke Shinyama
3d1652963a
Merge github.com:euske/pdfminer
2011-10-30 15:44:49 +10:00
dwilson
60dbf6bb69
avoids crash in pdf syntax error for missing ids
...
when an object id is out of range, rather than crashing, only raise a
pdf syntax error if STRICT is enabled and return None otherwise
2011-08-31 17:03:10 -04:00
Yusuke Shinyama
f638784e1d
experimental layout analysis improvements
2011-08-14 09:44:21 +09:00
Yusuke Shinyama
cbb8d869c7
removed initial cmap/ directory
2011-07-31 18:05:07 +10:00
Yusuke Shinyama
cdef0d7883
Merge github.com:euske/pdfminer
2011-07-31 17:47:20 +10:00
Yusuke Shinyama
46bb0107aa
fixed: crash due to small layout elements (thanks to hsoft)
2011-07-31 17:44:09 +10:00
Yusuke Shinyama
eec317ae10
Merge pull request #6 from rsennrich/master
...
cleaner widths for Adobe core 14 fonts. (thanks to rsennrich)
2011-07-31 00:39:36 -07:00
Yusuke Shinyama
24cd161fb7
CCITTFaxFilter.reversed fix
2011-07-31 17:36:02 +10:00
Rico
6e4f36d9a1
get width based on utf-8 char.
...
fills some gaps and fixes inconsistencies between standard encodings
2011-07-23 16:34:11 +02:00
Yusuke Shinyama
dc8fde0e47
added CCITTFaxFilter support and a very crude image extraction.
2011-07-18 21:07:00 +10:00
Yusuke Shinyama
2707ba75df
added CCITTFaxFilter support and a very crude image extraction.
2011-07-18 21:06:50 +10:00
Yusuke Shinyama
fda6f7ba5d
ccitt.py added.
2011-07-18 17:36:37 +10:00