Commit Graph

362 Commits (a636cbcfd42024cc0b260cce3e5d48c451e53323)

Author SHA1 Message Date
unknown 846cd18186 Python 3.4 support 2014-09-02 15:49:46 +02:00
unknown faea7291a8 tests pass under Py 2.7 and 3.4 2014-09-01 14:16:49 +02:00
Yusuke Shinyama b0e035c24f Style fix: always have an explicit return. 2014-07-15 21:38:29 +09:00
Yusuke Shinyama f5b5e31921 Fixed: DecodeParms array support. 2014-07-09 19:07:27 +09:00
Yusuke Shinyama 137fc3a1ae Use KWD instead of token.name. 2014-06-30 19:15:21 +09:00
Yusuke Shinyama 1ccfaff411 String-Bytes distinction (first attempt). 2014-06-30 19:05:56 +09:00
Yusuke Shinyama 8791355e1d Cleanup imports. Use relative imports. 2014-06-26 18:12:39 +09:00
Yusuke Shinyama 2e900e5d10 Fixed for consistent test results. (hopefully...) 2014-06-26 17:41:31 +09:00
Yusuke Shinyama fe86b4e64e Changed: StringIO -> io.BytesIO 2014-06-25 19:55:41 +09:00
Yusuke Shinyama 44074b42ea Added: stripcontrol for XMLConverter (-S option) 2014-06-22 00:33:00 +09:00
Yusuke Shinyama 81391c09f4 Fixed: #56 (with a derpy fix) 2014-06-18 19:11:45 +09:00
Yusuke Shinyama bb866ae148 Changed: new except syntax (2.6 or above). 2014-06-16 18:50:07 +09:00
Yusuke Shinyama 28e96ba3d0 Use print as a function. 2014-06-15 12:14:33 +09:00
Yusuke Shinyama 0387a6c260 Removed: tuple-unpacking args. 2014-06-15 12:12:13 +09:00
Yusuke Shinyama a8ec99a848 More autotest tweaks. 2014-06-15 10:52:59 +09:00
Yusuke Shinyama 1384a3fe8d Code cleanup: removed some debug flags. 2014-06-14 15:43:10 +09:00
Yusuke Shinyama d9680fca7e Plane: preserve the object order so that the test result is always consistent. 2014-06-14 14:44:53 +09:00
Yusuke Shinyama aed248610c Fixed: dependency on pygame in a unittest. 2014-06-14 12:05:26 +09:00
Yusuke Shinyama 8e14ebf4e1 Use logging module instead of print. 2014-06-14 12:00:49 +09:00
Yusuke Shinyama 8e8e22c095 Fixed a layout bug introduced at c97ec304. 2014-06-13 23:05:04 +09:00
numion a4997d6f10 Implement revision 4 and 5 encryption handler. 2014-05-19 16:27:43 +02:00
Michael R. Hines ae2547b0f2 Stop throwing exception on LITERALS_DCT_DECODE
I have PDF documents with images stream and two filters, don't throw exceptions on the second one (DCT).
2014-05-14 13:25:30 +08:00
Yusuke Shinyama 6b6fc264ff Code refactoring: CMap and UnicodeMap both inherit CMapBase. 2014-04-16 18:57:16 +09:00
Yusuke Shinyama b09c37902f Fixed: issue #48 (thanks to speedplane) 2014-04-09 17:55:50 +09:00
Yusuke Shinyama 7b354c7ab3 Version 20140328 2014-03-28 22:49:18 +09:00
Yusuke Shinyama 340387bfc6 Cleanup: isinstance 2014-03-28 17:50:59 +09:00
Yusuke Shinyama 7849c8724a Fixed: PDFXRefStream.get_objids returns invalid objids. 2014-03-28 17:29:26 +09:00
Yusuke Shinyama 57adad55d7 Revert the wrong fix. 2014-03-28 17:24:03 +09:00
Yusuke Shinyama b18e8c549d Version 20140327 2014-03-28 00:19:52 +09:00
Yusuke Shinyama ee47a6603a Fixed: issues #45 2014-03-28 00:18:17 +09:00
Yusuke Shinyama ab03037444 Version 20140324 2014-03-24 21:03:46 +09:00
Yusuke Shinyama 4b2beba398 Code cleanup. 2014-03-24 20:59:24 +09:00
Yusuke Shinyama f9079e4c0a Fixed dumppdf.py issues. 2014-03-24 20:55:00 +09:00
Yusuke Shinyama 607be269ab Applied a patch by Axel Kaiser. 2014-03-24 20:45:35 +09:00
Yusuke Shinyama d7c4ff28e9 Applied a patch by Axel Kaiser. 2014-03-24 20:39:30 +09:00
Yusuke Shinyama 636d4caeb3 Fixed the PNG predictor bug. Thanks to Gabor Molnar. 2014-03-24 19:57:05 +09:00
Yusuke Shinyama c97ec3048e Changed / to // for clarity. 2013-11-26 21:35:16 +09:00
Yusuke Shinyama b589da51b7 Fix for malformed PDFs. 2013-11-26 21:27:45 +09:00
Yusuke Shinyama cf1e3c9973 Version bump! 2013-11-13 14:52:01 +09:00
Yusuke Shinyama acad011e3f Code cleanup. 2013-11-11 20:46:30 +09:00
Yusuke Shinyama cbef967fbf Renamed: LTAnon -> LTAnno 2013-11-11 19:17:45 +09:00
Yusuke Shinyama c8b6d4112a Fixed: crash with negative layout bbox. 2013-11-09 15:10:14 +09:00
Yusuke Shinyama 2b56b2eedf Merged. 2013-11-07 19:50:41 +09:00
Matthew Duggan 2caa5edc25 PEP8: Whitespace changes to match pep8 2013-11-07 17:35:04 +09:00
Matthew Duggan c1da8b835c PEP8: Remove trailing whitespace 2013-11-07 16:14:53 +09:00
Matthew Duggan 024b821056 Make pyflakes happy by defining variable 2013-11-07 16:10:14 +09:00
Matthew Duggan 10a68c83bd Remove unused imports identified by pyflakes 2013-11-07 16:09:44 +09:00
Yusuke Shinyama 4ef81ae9d8 Improved word spacing. 2013-11-05 18:25:19 +09:00
Yusuke Shinyama 02ad086f6a fixed: HTMLConverter. 2013-10-25 18:10:40 +09:00
Yusuke Shinyama 87842233b3 Version bump! 2013-10-22 22:19:38 +09:00
Yusuke Shinyama d3730a29ec API change: process_pdf -> PDFPage.get_pages 2013-10-22 18:59:16 +09:00
Yusuke Shinyama e927bd307e fixed: https://github.com/euske/pdfminer/issues/8 2013-10-22 18:24:39 +09:00
Yusuke Shinyama 2aa757978b Reverted to Python2.x syntax. Fixed LZW decoding. 2013-10-19 08:19:40 +09:00
Yusuke Shinyama bfd9e93c12 Merge branch 'master' of https://github.com/JordanReiter/pdfminer into JordanReiter-master 2013-10-19 07:46:45 +09:00
Yusuke Shinyama 8e4c0c88e3 fixed: https://github.com/euske/pdfminer/issues/26 2013-10-17 23:20:08 +09:00
Yusuke Shinyama 0ea08890d4 renamed: python2 -> python. 2013-10-17 23:05:27 +09:00
Yusuke Shinyama 8d42eec94d in_cmap is on by default. 2013-10-17 21:40:43 +09:00
Yusuke Shinyama de9f9715e3 Added: Adobe-UCS 2013-10-17 21:35:25 +09:00
Yusuke Shinyama 1455f134c6 Fixed: missing ObjStm due to invalid seek. 2013-10-10 20:10:57 +09:00
Yusuke Shinyama f85c374cae Separated PDFPage to pdfpage.py. 2013-10-10 19:54:55 +09:00
Yusuke Shinyama 2df67d85ae Expand ObjStm in XRefFallback. 2013-10-10 19:40:43 +09:00
Yusuke Shinyama e4bc4e43b1 Code cleanup. 2013-10-10 19:17:58 +09:00
Yusuke Shinyama cfd60eafbf Removed PDFDocument.read_xref(). 2013-10-10 18:57:08 +09:00
Yusuke Shinyama 658be970b8 Separated PDFXRefFallback. 2013-10-10 18:44:12 +09:00
Yusuke Shinyama c926874d20 API Change: the PDFDocument cstr now takes PDFParser. set_parser() is removed. 2013-10-10 18:40:06 +09:00
Yusuke Shinyama 557c2c72e6 Removed ObjIdRange for terseness. 2013-10-10 18:34:43 +09:00
Yusuke Shinyama 2221163b94 Split pdfparser.py and pdfdocument.py. 2013-10-10 18:29:30 +09:00
Yusuke Shinyama 1467fc674c Added fallback for broken PDFs. 2013-10-09 22:45:54 +09:00
Yusuke Shinyama eabe72ee63 Prevent crash with empty layout box. 2013-10-09 22:13:22 +09:00
Yusuke Shinyama 87143cb36f Fallback when /Pages does not exist. 2013-10-09 22:08:16 +09:00
Yusuke Shinyama 06425bba00 Introducing PDFObjectNotFound 2013-10-09 21:39:23 +09:00
Yusuke Shinyama 3c3cba2ecc Moved: import PIL. 2013-04-09 18:42:32 +09:00
Yusuke Shinyama 19e7d70ac1 Merge pull request #15 from jcushman/patch-1
2x faster layout analysis: Use set instead of list for Plane's internal collection of objects.
2013-04-09 02:39:46 -07:00
Yusuke Shinyama 4faccff9c9 Merge pull request #16 from jcushman/master
2x faster group_textboxes function.
2013-04-09 01:58:56 -07:00
Yusuke Shinyama d8bc13b3af Merge pull request #13 from gendoc/master
PDFDocument.lookup_name.lookup isn't searching for 'Names' key.
2013-04-09 01:55:54 -07:00
Jordan Reiter e28b75a462 StringIO 2013-03-27 13:14:58 -04:00
Jordan Reiter 44653071c3 Fixes for LZW error (see https://bitbucket.org/hsoft/pdfminer3k/commits/ae9a4ca0691a/) 2013-03-27 13:05:29 -04:00
jcushman f77f196cd3 2x faster group_textboxes function. 2012-06-22 18:11:45 -03:00
jcushman da3f023b2d Use set instead of list for Plane's internal collection of objects. 2012-06-22 16:36:33 -03:00
Humberto Pereira 89c81db295 PDFDocument.lookup_names.lookup didn't find 'Names' in some files 2012-03-19 16:42:58 -03:00
Jim Morrison 6413eb7de4 Deal with CMYK images by converting them to RGB. PIL does not invert CMYK images as of PIL 1.1.7, so the invert happens in ImageWriter. 2012-01-24 16:18:36 -08:00
Yusuke Shinyama c7709045e9 fixed: invalid bmp file output 2011-11-08 00:29:24 +10:00
Yusuke Shinyama 82ff98c7b3 imagewriter now works with text output 2011-11-07 01:15:10 +10:00
Yusuke Shinyama 91174b5665 avoid crash when colorspace is null. 2011-11-06 20:10:48 +10:00
Yusuke Shinyama 3d1652963a Merge github.com:euske/pdfminer 2011-10-30 15:44:49 +10:00
dwilson 60dbf6bb69 avoids crash in pdf syntax error for missing ids
when an object id is out of range, rather than crashing, only raise a
pdf syntax error if STRICT is enabled and return None otherwise
2011-08-31 17:03:10 -04:00
Yusuke Shinyama f638784e1d experimental layout analysis improvements 2011-08-14 09:44:21 +09:00
Yusuke Shinyama cbb8d869c7 removed initial cmap/ directory 2011-07-31 18:05:07 +10:00
Yusuke Shinyama cdef0d7883 Merge github.com:euske/pdfminer 2011-07-31 17:47:20 +10:00
Yusuke Shinyama 46bb0107aa fixed: crash due to small layout elements (thanks to hsoft) 2011-07-31 17:44:09 +10:00
Yusuke Shinyama eec317ae10 Merge pull request #6 from rsennrich/master
cleaner widths for Adobe core 14 fonts. (thanks to rsennrich)
2011-07-31 00:39:36 -07:00
Yusuke Shinyama 24cd161fb7 CCITTFaxFilter.reversed fix 2011-07-31 17:36:02 +10:00
Rico 6e4f36d9a1 get width based on utf-8 char.
fills some gaps and fixes inconsistencies between standard encodings
2011-07-23 16:34:11 +02:00
Yusuke Shinyama dc8fde0e47 added CCITTFaxFilter support and a very crude image extraction. 2011-07-18 21:07:00 +10:00
Yusuke Shinyama 2707ba75df added CCITTFaxFilter support and a very crude image extraction. 2011-07-18 21:06:50 +10:00
Yusuke Shinyama fda6f7ba5d ccitt.py added. 2011-07-18 17:36:37 +10:00
Yusuke Shinyama 0278076ea8 PNG predictor added 2011-06-07 00:46:33 +09:00
Yusuke Shinyama 18a5058af6 separated predictor functions. 2011-06-07 00:31:03 +09:00
Yusuke Shinyama 170c97a12b colorspace patch by Lieb Simon 2011-06-06 17:10:12 +09:00
Yusuke Shinyama 2e8180ddee documentation update and version bump 2011-05-15 01:37:14 +09:00
Yusuke Shinyama c134596e2f code cleanup and testcase stabilization 2011-05-15 01:22:19 +09:00
Yusuke Shinyama e5d02f8653 fixed the infinite recursion bug. 2011-05-14 16:32:09 +09:00
Yusuke Shinyama 0c41b8348e code cleanup 2011-05-14 15:51:40 +09:00
Yusuke Shinyama 038ce4cd0c added LTText.get_text() and .text property is no longer accessible. 2011-05-14 15:45:08 +09:00
Yusuke Shinyama 5004e4b28d layout analysis speedup. 2011-05-14 14:17:39 +09:00
Yusuke Shinyama 095534b294 figure object now does not call analyze. 2011-05-14 14:17:22 +09:00
Yusuke Shinyama b8d516fc52 extended Plane class. 2011-05-14 14:16:40 +09:00
Yusuke Shinyama fcf0d74ecc tweaks for debugging 2011-04-21 22:07:52 +09:00
Yusuke Shinyama 8f9684f6a6 code cleanup: layout analysis 2011-04-21 22:07:04 +09:00
Yusuke Shinyama 0e660dd385 rename: LTPolygon -> LTCurve 2011-04-20 22:05:25 +09:00
Yusuke Shinyama dab70855bf LTLine is now strictly horizontal or vertical. 2011-04-20 22:01:54 +09:00
Jonathan J Hunt ec682539da Optimized memory usage in TextConverter by ignoring all drawing commands. 2011-03-07 15:11:31 +10:00
Yusuke Shinyama 4918d59bc2 disable caching support 2011-03-03 00:04:43 +09:00
Yusuke Shinyama 18e782f330 canonicalize package names 2011-03-02 23:43:03 +09:00
Yusuke Shinyama bb26cf9180 eliminate empty textboxes 2011-03-01 20:47:20 +09:00
Yusuke Shinyama dfd621b98c minor bugfix. thanks to Hiroshi Manabe. 2011-02-28 19:50:07 +09:00
Yusuke Shinyama f22b056454 release-20110227 2011-02-27 19:53:12 +09:00
Yusuke Shinyama a8bf9b159e docstring fix 2011-02-27 13:09:12 +09:00
Yusuke Shinyama cabaa10e4f layout analysis improvement 2011-02-27 12:56:28 +09:00
Yusuke Shinyama 7dbb664db3 code cleanup and more debugging options 2011-02-14 23:42:05 +09:00
Yusuke Shinyama f00f1dbd04 better layout analysis 2011-02-14 23:41:23 +09:00
Yusuke Shinyama b2d13db29a code cleanup 2011-02-14 22:51:20 +09:00
Yusuke Shinyama cd412308bd text flow detection bug fix (thanks to fujimoto-san) 2011-02-14 22:32:55 +09:00
Yusuke Shinyama cbd58121e3 fix aggressive vertical writing detection (which ruins layout) 2011-02-02 23:09:34 +09:00
Yusuke Shinyama 109aedeb43 cfffont extension with no luck 2011-01-25 00:19:07 +09:00
Yusuke Shinyama 4eb6083c09 code cleanup 2011-01-03 18:11:22 +09:00
Yusuke Shinyama 16b2a87b24 CMAP_PATH environment variable support 2011-01-03 18:11:16 +09:00
Yusuke Shinyama 420169a692 release 20101226 2010-12-26 19:06:47 +09:00
Yusuke Shinyama a24c452ba2 boxes_flow patch by Daniel Gerber 2010-12-26 17:26:39 +09:00
Yusuke Shinyama 3da3adad9b method renamed: finish(self) -> analyze(self, laparams). 2010-12-26 16:56:21 +09:00
yusuke.shinyama.dummy 84ed94aec0 another bugfix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@281 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:41:03 +00:00
yusuke.shinyama.dummy 9bba7ac08b oops, forgot to fix this
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@280 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:40:58 +00:00
yusuke.shinyama.dummy f4ced29713 bugfix by Kevin Brubeck Unhammer
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@278 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:40:45 +00:00
yusuke.shinyama.dummy 2bf9c23801 check_extractable paramater added
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@276 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:28 +00:00
yusuke.shinyama.dummy 9f78915ea6 show cid for unknown characters
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@275 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:19 +00:00
yusuke.shinyama.dummy 7374b81383 htmlconverter improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@274 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 15:04:28 +00:00
yusuke.shinyama.dummy fb4ce96309 add font-family
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@273 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:50 +00:00
yusuke.shinyama.dummy 476ecf7e32 add html exect layout mode; default changed.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@272 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:41 +00:00
yusuke.shinyama.dummy 08c5c66917 add debugging features
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@271 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:34 +00:00
yusuke.shinyama.dummy 434b24b6e5 remove unused method
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@270 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:27 +00:00
yusuke.shinyama.dummy 0d1f00fa9b improved layout analysis for vertical script
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@269 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:14 +00:00
yusuke.shinyama.dummy 9584845358 layout analysis improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@268 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:05 +00:00
yusuke.shinyama.dummy edbd3764a7 html layout output fix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@267 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:48 +00:00
yusuke.shinyama.dummy 1904b61355 documentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@266 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:40 +00:00
yusuke.shinyama.dummy 1a25c61a9f fix empty hexstring bug and test cases.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@265 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-27 12:29:00 +00:00
yusuke.shinyama.dummy 509ab66319 stay with python2
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy 438b4953be documentation bit and code cleanup
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@263 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:49 +00:00
yusuke.shinyama.dummy 71863aec67 minor bugfix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@262 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:43 +00:00
yusuke.shinyama.dummy 6a4b70f54a code cleanup
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@261 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:38 +00:00
yusuke.shinyama.dummy 98442ed943 update the version number and documentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@256 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:58 +00:00