Yusuke Shinyama
177a4ab937
Fixed : #132 (PDFStream.get_filters: support multiple parameterless filters)
2016-09-11 23:52:13 +09:00
Yusuke Shinyama
e95a483790
Merge pull request #134 from speedplane/feature/Fix-Get-Filters
...
Fix Bug with PDF Stream Decoder
2016-09-11 23:48:42 +09:00
Yusuke Shinyama
64fe538b24
Fixed : #114 (UnicodeEncodeError in PSLiteral)
2016-09-11 23:43:22 +09:00
speedplane
2049462f6f
Revert changes unrelated to this branch.
2016-06-13 23:42:21 -04:00
speedplane
b0b8818a41
Fix a bug with pdfminer which occurs when two or more filters are applied to a stream, even though no parameters are specified. The code would previously drop all of the streams after the first due to misapplication of the zip function.
2016-06-13 23:35:11 -04:00
Yusuke Shinyama
14fd0fd2d6
Fixed : #84 (fontname was in unicode)
2015-04-05 19:02:02 +09:00
speedplane
806ee603ff
More fixes to layout. The compute neighbors function for horizontal lines is only intended to find neighbors on differing lines. However, it's entirely possible that horizontal neighbors could appear.
...
This commit finds horizontal neighbors in a horizonal line and merges them together into a single horizontal line if necessary. This leads to much better text extraction if the PDF was created in a funky way.
For example (test case coming), I have seen PDFs which are written almost like vertical columns, but the text is entirely horizontal.
2014-12-12 00:36:59 -05:00
speedplane
45170e7183
There are a number of relatively complex changes here. Comments are in order of where the change appears.
...
1.
When detecting text in a horizontal line, we already add a space between words if separated by more than word_margin apart. However now, we only do it if there is not already an existing space. This prevents multiple spaces being placed between words.
2.
Detect a horizontal line if the line is zero width. This improves our detection of horizonal lines when looking for both horizontal and vertical.
3.
Don't detect a vertical line if the previous letter is whitspace. Prevents double spaces being caught as vert lines.
4.
Improve upon an unfortunate O(N^2) algorithm which I have seen taking many minutes to execute. Unfortunately, while the "fix" reduces algorithmic complexity, it isn't technically correct, so we only do it when we know things will take a long time.
2014-12-12 00:36:59 -05:00
Yusuke Shinyama
0112112458
Fixed: crash on invalid chr number.
2014-12-09 22:55:47 +09:00
speedplane
36977fbe08
Add debug flags for much of the debug output.
2014-11-11 23:36:58 -05:00
speedplane
ecc4d05675
Fix a unicode conversion bug.
...
See https://github.com/euske/pdfminer/issues/75
2014-11-11 23:34:33 -05:00
Yusuke Shinyama
b0e035c24f
Style fix: always have an explicit return.
2014-07-15 21:38:29 +09:00
Yusuke Shinyama
f5b5e31921
Fixed: DecodeParms array support.
2014-07-09 19:07:27 +09:00
Yusuke Shinyama
137fc3a1ae
Use KWD instead of token.name.
2014-06-30 19:15:21 +09:00
Yusuke Shinyama
1ccfaff411
String-Bytes distinction (first attempt).
2014-06-30 19:05:56 +09:00
Yusuke Shinyama
8791355e1d
Cleanup imports. Use relative imports.
2014-06-26 18:12:39 +09:00
Yusuke Shinyama
2e900e5d10
Fixed for consistent test results. (hopefully...)
2014-06-26 17:41:31 +09:00
Yusuke Shinyama
fe86b4e64e
Changed: StringIO -> io.BytesIO
2014-06-25 19:55:41 +09:00
Yusuke Shinyama
44074b42ea
Added: stripcontrol for XMLConverter (-S option)
2014-06-22 00:33:00 +09:00
Yusuke Shinyama
81391c09f4
Fixed : #56 (with a derpy fix)
2014-06-18 19:11:45 +09:00
Yusuke Shinyama
bb866ae148
Changed: new except syntax (2.6 or above).
2014-06-16 18:50:07 +09:00
Yusuke Shinyama
28e96ba3d0
Use print as a function.
2014-06-15 12:14:33 +09:00
Yusuke Shinyama
0387a6c260
Removed: tuple-unpacking args.
2014-06-15 12:12:13 +09:00
Yusuke Shinyama
a8ec99a848
More autotest tweaks.
2014-06-15 10:52:59 +09:00
Yusuke Shinyama
1384a3fe8d
Code cleanup: removed some debug flags.
2014-06-14 15:43:10 +09:00
Yusuke Shinyama
d9680fca7e
Plane: preserve the object order so that the test result is always consistent.
2014-06-14 14:44:53 +09:00
Yusuke Shinyama
aed248610c
Fixed: dependency on pygame in a unittest.
2014-06-14 12:05:26 +09:00
Yusuke Shinyama
8e14ebf4e1
Use logging module instead of print.
2014-06-14 12:00:49 +09:00
Yusuke Shinyama
8e8e22c095
Fixed a layout bug introduced at c97ec304
.
2014-06-13 23:05:04 +09:00
numion
a4997d6f10
Implement revision 4 and 5 encryption handler.
2014-05-19 16:27:43 +02:00
Michael R. Hines
ae2547b0f2
Stop throwing exception on LITERALS_DCT_DECODE
...
I have PDF documents with images stream and two filters, don't throw exceptions on the second one (DCT).
2014-05-14 13:25:30 +08:00
Yusuke Shinyama
6b6fc264ff
Code refactoring: CMap and UnicodeMap both inherit CMapBase.
2014-04-16 18:57:16 +09:00
Yusuke Shinyama
b09c37902f
Fixed: issue #48 (thanks to speedplane)
2014-04-09 17:55:50 +09:00
Yusuke Shinyama
7b354c7ab3
Version 20140328
2014-03-28 22:49:18 +09:00
Yusuke Shinyama
340387bfc6
Cleanup: isinstance
2014-03-28 17:50:59 +09:00
Yusuke Shinyama
7849c8724a
Fixed: PDFXRefStream.get_objids returns invalid objids.
2014-03-28 17:29:26 +09:00
Yusuke Shinyama
57adad55d7
Revert the wrong fix.
2014-03-28 17:24:03 +09:00
Yusuke Shinyama
b18e8c549d
Version 20140327
2014-03-28 00:19:52 +09:00
Yusuke Shinyama
ee47a6603a
Fixed: issues #45
2014-03-28 00:18:17 +09:00
Yusuke Shinyama
ab03037444
Version 20140324
2014-03-24 21:03:46 +09:00
Yusuke Shinyama
4b2beba398
Code cleanup.
2014-03-24 20:59:24 +09:00
Yusuke Shinyama
f9079e4c0a
Fixed dumppdf.py issues.
2014-03-24 20:55:00 +09:00
Yusuke Shinyama
607be269ab
Applied a patch by Axel Kaiser.
2014-03-24 20:45:35 +09:00
Yusuke Shinyama
d7c4ff28e9
Applied a patch by Axel Kaiser.
2014-03-24 20:39:30 +09:00
Yusuke Shinyama
636d4caeb3
Fixed the PNG predictor bug. Thanks to Gabor Molnar.
2014-03-24 19:57:05 +09:00
Yusuke Shinyama
c97ec3048e
Changed / to // for clarity.
2013-11-26 21:35:16 +09:00
Yusuke Shinyama
b589da51b7
Fix for malformed PDFs.
2013-11-26 21:27:45 +09:00
Yusuke Shinyama
cf1e3c9973
Version bump!
2013-11-13 14:52:01 +09:00
Yusuke Shinyama
acad011e3f
Code cleanup.
2013-11-11 20:46:30 +09:00
Yusuke Shinyama
cbef967fbf
Renamed: LTAnon -> LTAnno
2013-11-11 19:17:45 +09:00