Commit Graph

245 Commits (aed248610c844e9669fa8501af7495b0a8ab40a0)

Author SHA1 Message Date
Yusuke Shinyama 1467fc674c Added fallback for broken PDFs. 2013-10-09 22:45:54 +09:00
Yusuke Shinyama eabe72ee63 Prevent crash with empty layout box. 2013-10-09 22:13:22 +09:00
Yusuke Shinyama 87143cb36f Fallback when /Pages does not exist. 2013-10-09 22:08:16 +09:00
Yusuke Shinyama 06425bba00 Introducing PDFObjectNotFound 2013-10-09 21:39:23 +09:00
Yusuke Shinyama 3c3cba2ecc Moved: import PIL. 2013-04-09 18:42:32 +09:00
Yusuke Shinyama 19e7d70ac1 Merge pull request #15 from jcushman/patch-1
2x faster layout analysis: Use set instead of list for Plane's internal collection of objects.
2013-04-09 02:39:46 -07:00
Yusuke Shinyama 4faccff9c9 Merge pull request #16 from jcushman/master
2x faster group_textboxes function.
2013-04-09 01:58:56 -07:00
Yusuke Shinyama d8bc13b3af Merge pull request #13 from gendoc/master
PDFDocument.lookup_name.lookup isn't searching for 'Names' key.
2013-04-09 01:55:54 -07:00
Jordan Reiter e28b75a462 StringIO 2013-03-27 13:14:58 -04:00
Jordan Reiter 44653071c3 Fixes for LZW error (see https://bitbucket.org/hsoft/pdfminer3k/commits/ae9a4ca0691a/) 2013-03-27 13:05:29 -04:00
jcushman f77f196cd3 2x faster group_textboxes function. 2012-06-22 18:11:45 -03:00
jcushman da3f023b2d Use set instead of list for Plane's internal collection of objects. 2012-06-22 16:36:33 -03:00
Humberto Pereira 89c81db295 PDFDocument.lookup_names.lookup didn't find 'Names' in some files 2012-03-19 16:42:58 -03:00
Jim Morrison 6413eb7de4 Deal with CMYK images by converting them to RGB. PIL does not invert CMYK images as of PIL 1.1.7, so the invert happens in ImageWriter. 2012-01-24 16:18:36 -08:00
Yusuke Shinyama c7709045e9 fixed: invalid bmp file output 2011-11-08 00:29:24 +10:00
Yusuke Shinyama 82ff98c7b3 imagewriter now works with text output 2011-11-07 01:15:10 +10:00
Yusuke Shinyama 91174b5665 avoid crash when colorspace is null. 2011-11-06 20:10:48 +10:00
Yusuke Shinyama 3d1652963a Merge github.com:euske/pdfminer 2011-10-30 15:44:49 +10:00
dwilson 60dbf6bb69 avoids crash in pdf syntax error for missing ids
when an object id is out of range, rather than crashing, only raise a
pdf syntax error if STRICT is enabled and return None otherwise
2011-08-31 17:03:10 -04:00
Yusuke Shinyama f638784e1d experimental layout analysis improvements 2011-08-14 09:44:21 +09:00
Yusuke Shinyama cbb8d869c7 removed initial cmap/ directory 2011-07-31 18:05:07 +10:00
Yusuke Shinyama cdef0d7883 Merge github.com:euske/pdfminer 2011-07-31 17:47:20 +10:00
Yusuke Shinyama 46bb0107aa fixed: crash due to small layout elements (thanks to hsoft) 2011-07-31 17:44:09 +10:00
Yusuke Shinyama eec317ae10 Merge pull request #6 from rsennrich/master
cleaner widths for Adobe core 14 fonts. (thanks to rsennrich)
2011-07-31 00:39:36 -07:00
Yusuke Shinyama 24cd161fb7 CCITTFaxFilter.reversed fix 2011-07-31 17:36:02 +10:00
Rico 6e4f36d9a1 get width based on utf-8 char.
fills some gaps and fixes inconsistencies between standard encodings
2011-07-23 16:34:11 +02:00
Yusuke Shinyama dc8fde0e47 added CCITTFaxFilter support and a very crude image extraction. 2011-07-18 21:07:00 +10:00
Yusuke Shinyama 2707ba75df added CCITTFaxFilter support and a very crude image extraction. 2011-07-18 21:06:50 +10:00
Yusuke Shinyama fda6f7ba5d ccitt.py added. 2011-07-18 17:36:37 +10:00
Yusuke Shinyama 0278076ea8 PNG predictor added 2011-06-07 00:46:33 +09:00
Yusuke Shinyama 18a5058af6 separated predictor functions. 2011-06-07 00:31:03 +09:00
Yusuke Shinyama 170c97a12b colorspace patch by Lieb Simon 2011-06-06 17:10:12 +09:00
Yusuke Shinyama 2e8180ddee documentation update and version bump 2011-05-15 01:37:14 +09:00
Yusuke Shinyama c134596e2f code cleanup and testcase stabilization 2011-05-15 01:22:19 +09:00
Yusuke Shinyama e5d02f8653 fixed the infinite recursion bug. 2011-05-14 16:32:09 +09:00
Yusuke Shinyama 0c41b8348e code cleanup 2011-05-14 15:51:40 +09:00
Yusuke Shinyama 038ce4cd0c added LTText.get_text() and .text property is no longer accessible. 2011-05-14 15:45:08 +09:00
Yusuke Shinyama 5004e4b28d layout analysis speedup. 2011-05-14 14:17:39 +09:00
Yusuke Shinyama 095534b294 figure object now does not call analyze. 2011-05-14 14:17:22 +09:00
Yusuke Shinyama b8d516fc52 extended Plane class. 2011-05-14 14:16:40 +09:00
Yusuke Shinyama fcf0d74ecc tweaks for debugging 2011-04-21 22:07:52 +09:00
Yusuke Shinyama 8f9684f6a6 code cleanup: layout analysis 2011-04-21 22:07:04 +09:00
Yusuke Shinyama 0e660dd385 rename: LTPolygon -> LTCurve 2011-04-20 22:05:25 +09:00
Yusuke Shinyama dab70855bf LTLine is now strictly horizontal or vertical. 2011-04-20 22:01:54 +09:00
Jonathan J Hunt ec682539da Optimized memory usage in TextConverter by ignoring all drawing commands. 2011-03-07 15:11:31 +10:00
Yusuke Shinyama 4918d59bc2 disable caching support 2011-03-03 00:04:43 +09:00
Yusuke Shinyama 18e782f330 canonicalize package names 2011-03-02 23:43:03 +09:00
Yusuke Shinyama bb26cf9180 eliminate empty textboxes 2011-03-01 20:47:20 +09:00
Yusuke Shinyama dfd621b98c minor bugfix. thanks to Hiroshi Manabe. 2011-02-28 19:50:07 +09:00
Yusuke Shinyama f22b056454 release-20110227 2011-02-27 19:53:12 +09:00
Yusuke Shinyama a8bf9b159e docstring fix 2011-02-27 13:09:12 +09:00
Yusuke Shinyama cabaa10e4f layout analysis improvement 2011-02-27 12:56:28 +09:00
Yusuke Shinyama 7dbb664db3 code cleanup and more debugging options 2011-02-14 23:42:05 +09:00
Yusuke Shinyama f00f1dbd04 better layout analysis 2011-02-14 23:41:23 +09:00
Yusuke Shinyama b2d13db29a code cleanup 2011-02-14 22:51:20 +09:00
Yusuke Shinyama cd412308bd text flow detection bug fix (thanks to fujimoto-san) 2011-02-14 22:32:55 +09:00
Yusuke Shinyama cbd58121e3 fix aggressive vertical writing detection (which ruins layout) 2011-02-02 23:09:34 +09:00
Yusuke Shinyama 109aedeb43 cfffont extension with no luck 2011-01-25 00:19:07 +09:00
Yusuke Shinyama 4eb6083c09 code cleanup 2011-01-03 18:11:22 +09:00
Yusuke Shinyama 16b2a87b24 CMAP_PATH environment variable support 2011-01-03 18:11:16 +09:00
Yusuke Shinyama 420169a692 release 20101226 2010-12-26 19:06:47 +09:00
Yusuke Shinyama a24c452ba2 boxes_flow patch by Daniel Gerber 2010-12-26 17:26:39 +09:00
Yusuke Shinyama 3da3adad9b method renamed: finish(self) -> analyze(self, laparams). 2010-12-26 16:56:21 +09:00
yusuke.shinyama.dummy 84ed94aec0 another bugfix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@281 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:41:03 +00:00
yusuke.shinyama.dummy 9bba7ac08b oops, forgot to fix this
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@280 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:40:58 +00:00
yusuke.shinyama.dummy f4ced29713 bugfix by Kevin Brubeck Unhammer
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@278 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:40:45 +00:00
yusuke.shinyama.dummy 2bf9c23801 check_extractable paramater added
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@276 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:28 +00:00
yusuke.shinyama.dummy 9f78915ea6 show cid for unknown characters
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@275 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:19 +00:00
yusuke.shinyama.dummy 7374b81383 htmlconverter improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@274 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 15:04:28 +00:00
yusuke.shinyama.dummy fb4ce96309 add font-family
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@273 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:50 +00:00
yusuke.shinyama.dummy 476ecf7e32 add html exect layout mode; default changed.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@272 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:41 +00:00
yusuke.shinyama.dummy 08c5c66917 add debugging features
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@271 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:34 +00:00
yusuke.shinyama.dummy 434b24b6e5 remove unused method
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@270 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:27 +00:00
yusuke.shinyama.dummy 0d1f00fa9b improved layout analysis for vertical script
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@269 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:14 +00:00
yusuke.shinyama.dummy 9584845358 layout analysis improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@268 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:05 +00:00
yusuke.shinyama.dummy edbd3764a7 html layout output fix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@267 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:48 +00:00
yusuke.shinyama.dummy 1904b61355 documentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@266 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:40 +00:00
yusuke.shinyama.dummy 1a25c61a9f fix empty hexstring bug and test cases.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@265 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-27 12:29:00 +00:00
yusuke.shinyama.dummy 509ab66319 stay with python2
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy 438b4953be documentation bit and code cleanup
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@263 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:49 +00:00
yusuke.shinyama.dummy 71863aec67 minor bugfix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@262 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:43 +00:00
yusuke.shinyama.dummy 6a4b70f54a code cleanup
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@261 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:38 +00:00
yusuke.shinyama.dummy 98442ed943 update the version number and documentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@256 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:58 +00:00
yusuke.shinyama.dummy cc139db8a7 bugfix LTChar.is_vertical undefined. verticality is now handled by LTTextBox
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@254 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:23 +00:00
yusuke.shinyama.dummy 21f6cf8fb6 removed PDFStream.decomp(). turned out zlib can handle trailing bytes.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@253 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:18 +00:00
yusuke.shinyama.dummy 0ecd0b8f9d attempt to recover encoding info from texfont
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@252 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:12 +00:00
yusuke.shinyama.dummy afe33312c6 outline bug fixed
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@249 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:14:52 +00:00
yusuke.shinyama.dummy 0b962443ed patch by Alexander Garden
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@248 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:14:46 +00:00
yusuke.shinyama.dummy 69d9d85685 nunpack TypeError fix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@246 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:52 +00:00
yusuke.shinyama.dummy 3305c07ba2 layout analysis improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@245 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:39 +00:00
yusuke.shinyama.dummy bc1303e901 layout analysis improvement 1
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@244 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:33 +00:00
yusuke.shinyama.dummy 3b2aabaa10 version bump
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@243 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 07:00:01 +00:00
yusuke.shinyama.dummy 0944cfaded test file simple3.pdf added.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@240 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:41 +00:00
yusuke.shinyama.dummy 83d2086f19 fix minor layout issue
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@239 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:31 +00:00
yusuke.shinyama.dummy b871331659 improvement in fallback
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@238 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:24 +00:00
yusuke.shinyama.dummy 4554705881 glyphlist bug (due to my misunderstanding of spec.)
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@237 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-26 15:02:46 +00:00
yusuke.shinyama.dummy ac74542d1f minor bugfixes
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@234 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-26 15:02:29 +00:00
yusuke.shinyama.dummy 1a8692124f version bump
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@233 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 04:31:12 +00:00
yusuke.shinyama.dummy 2d02833936 release 20100619
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@230 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 03:58:20 +00:00
yusuke.shinyama.dummy f5aff374fc some wordings and documentations
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@229 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 03:56:50 +00:00