Commit Graph

362 Commits (a636cbcfd42024cc0b260cce3e5d48c451e53323)

Author SHA1 Message Date
Yusuke Shinyama d3730a29ec API change: process_pdf -> PDFPage.get_pages 2013-10-22 18:59:16 +09:00
Yusuke Shinyama e927bd307e fixed: https://github.com/euske/pdfminer/issues/8 2013-10-22 18:24:39 +09:00
Yusuke Shinyama 2aa757978b Reverted to Python2.x syntax. Fixed LZW decoding. 2013-10-19 08:19:40 +09:00
Yusuke Shinyama bfd9e93c12 Merge branch 'master' of https://github.com/JordanReiter/pdfminer into JordanReiter-master 2013-10-19 07:46:45 +09:00
Yusuke Shinyama 8e4c0c88e3 fixed: https://github.com/euske/pdfminer/issues/26 2013-10-17 23:20:08 +09:00
Yusuke Shinyama 0ea08890d4 renamed: python2 -> python. 2013-10-17 23:05:27 +09:00
Yusuke Shinyama 8d42eec94d in_cmap is on by default. 2013-10-17 21:40:43 +09:00
Yusuke Shinyama de9f9715e3 Added: Adobe-UCS 2013-10-17 21:35:25 +09:00
Yusuke Shinyama 1455f134c6 Fixed: missing ObjStm due to invalid seek. 2013-10-10 20:10:57 +09:00
Yusuke Shinyama f85c374cae Separated PDFPage to pdfpage.py. 2013-10-10 19:54:55 +09:00
Yusuke Shinyama 2df67d85ae Expand ObjStm in XRefFallback. 2013-10-10 19:40:43 +09:00
Yusuke Shinyama e4bc4e43b1 Code cleanup. 2013-10-10 19:17:58 +09:00
Yusuke Shinyama cfd60eafbf Removed PDFDocument.read_xref(). 2013-10-10 18:57:08 +09:00
Yusuke Shinyama 658be970b8 Separated PDFXRefFallback. 2013-10-10 18:44:12 +09:00
Yusuke Shinyama c926874d20 API Change: the PDFDocument cstr now takes PDFParser. set_parser() is removed. 2013-10-10 18:40:06 +09:00
Yusuke Shinyama 557c2c72e6 Removed ObjIdRange for terseness. 2013-10-10 18:34:43 +09:00
Yusuke Shinyama 2221163b94 Split pdfparser.py and pdfdocument.py. 2013-10-10 18:29:30 +09:00
Yusuke Shinyama 1467fc674c Added fallback for broken PDFs. 2013-10-09 22:45:54 +09:00
Yusuke Shinyama eabe72ee63 Prevent crash with empty layout box. 2013-10-09 22:13:22 +09:00
Yusuke Shinyama 87143cb36f Fallback when /Pages does not exist. 2013-10-09 22:08:16 +09:00
Yusuke Shinyama 06425bba00 Introducing PDFObjectNotFound 2013-10-09 21:39:23 +09:00
Yusuke Shinyama 3c3cba2ecc Moved: import PIL. 2013-04-09 18:42:32 +09:00
Yusuke Shinyama 19e7d70ac1 Merge pull request #15 from jcushman/patch-1
2x faster layout analysis: Use set instead of list for Plane's internal collection of objects.
2013-04-09 02:39:46 -07:00
Yusuke Shinyama 4faccff9c9 Merge pull request #16 from jcushman/master
2x faster group_textboxes function.
2013-04-09 01:58:56 -07:00
Yusuke Shinyama d8bc13b3af Merge pull request #13 from gendoc/master
PDFDocument.lookup_name.lookup isn't searching for 'Names' key.
2013-04-09 01:55:54 -07:00
Jordan Reiter e28b75a462 StringIO 2013-03-27 13:14:58 -04:00
Jordan Reiter 44653071c3 Fixes for LZW error (see https://bitbucket.org/hsoft/pdfminer3k/commits/ae9a4ca0691a/) 2013-03-27 13:05:29 -04:00
jcushman f77f196cd3 2x faster group_textboxes function. 2012-06-22 18:11:45 -03:00
jcushman da3f023b2d Use set instead of list for Plane's internal collection of objects. 2012-06-22 16:36:33 -03:00
Humberto Pereira 89c81db295 PDFDocument.lookup_names.lookup didn't find 'Names' in some files 2012-03-19 16:42:58 -03:00
Jim Morrison 6413eb7de4 Deal with CMYK images by converting them to RGB. PIL does not invert CMYK images as of PIL 1.1.7, so the invert happens in ImageWriter. 2012-01-24 16:18:36 -08:00
Yusuke Shinyama c7709045e9 fixed: invalid bmp file output 2011-11-08 00:29:24 +10:00
Yusuke Shinyama 82ff98c7b3 imagewriter now works with text output 2011-11-07 01:15:10 +10:00
Yusuke Shinyama 91174b5665 avoid crash when colorspace is null. 2011-11-06 20:10:48 +10:00
Yusuke Shinyama 3d1652963a Merge github.com:euske/pdfminer 2011-10-30 15:44:49 +10:00
dwilson 60dbf6bb69 avoids crash in pdf syntax error for missing ids
when an object id is out of range, rather than crashing, only raise a
pdf syntax error if STRICT is enabled and return None otherwise
2011-08-31 17:03:10 -04:00
Yusuke Shinyama f638784e1d experimental layout analysis improvements 2011-08-14 09:44:21 +09:00
Yusuke Shinyama cbb8d869c7 removed initial cmap/ directory 2011-07-31 18:05:07 +10:00
Yusuke Shinyama cdef0d7883 Merge github.com:euske/pdfminer 2011-07-31 17:47:20 +10:00
Yusuke Shinyama 46bb0107aa fixed: crash due to small layout elements (thanks to hsoft) 2011-07-31 17:44:09 +10:00
Yusuke Shinyama eec317ae10 Merge pull request #6 from rsennrich/master
cleaner widths for Adobe core 14 fonts. (thanks to rsennrich)
2011-07-31 00:39:36 -07:00
Yusuke Shinyama 24cd161fb7 CCITTFaxFilter.reversed fix 2011-07-31 17:36:02 +10:00
Rico 6e4f36d9a1 get width based on utf-8 char.
fills some gaps and fixes inconsistencies between standard encodings
2011-07-23 16:34:11 +02:00
Yusuke Shinyama dc8fde0e47 added CCITTFaxFilter support and a very crude image extraction. 2011-07-18 21:07:00 +10:00
Yusuke Shinyama 2707ba75df added CCITTFaxFilter support and a very crude image extraction. 2011-07-18 21:06:50 +10:00
Yusuke Shinyama fda6f7ba5d ccitt.py added. 2011-07-18 17:36:37 +10:00
Yusuke Shinyama 0278076ea8 PNG predictor added 2011-06-07 00:46:33 +09:00
Yusuke Shinyama 18a5058af6 separated predictor functions. 2011-06-07 00:31:03 +09:00
Yusuke Shinyama 170c97a12b colorspace patch by Lieb Simon 2011-06-06 17:10:12 +09:00
Yusuke Shinyama 2e8180ddee documentation update and version bump 2011-05-15 01:37:14 +09:00
Yusuke Shinyama c134596e2f code cleanup and testcase stabilization 2011-05-15 01:22:19 +09:00
Yusuke Shinyama e5d02f8653 fixed the infinite recursion bug. 2011-05-14 16:32:09 +09:00
Yusuke Shinyama 0c41b8348e code cleanup 2011-05-14 15:51:40 +09:00
Yusuke Shinyama 038ce4cd0c added LTText.get_text() and .text property is no longer accessible. 2011-05-14 15:45:08 +09:00
Yusuke Shinyama 5004e4b28d layout analysis speedup. 2011-05-14 14:17:39 +09:00
Yusuke Shinyama 095534b294 figure object now does not call analyze. 2011-05-14 14:17:22 +09:00
Yusuke Shinyama b8d516fc52 extended Plane class. 2011-05-14 14:16:40 +09:00
Yusuke Shinyama fcf0d74ecc tweaks for debugging 2011-04-21 22:07:52 +09:00
Yusuke Shinyama 8f9684f6a6 code cleanup: layout analysis 2011-04-21 22:07:04 +09:00
Yusuke Shinyama 0e660dd385 rename: LTPolygon -> LTCurve 2011-04-20 22:05:25 +09:00
Yusuke Shinyama dab70855bf LTLine is now strictly horizontal or vertical. 2011-04-20 22:01:54 +09:00
Jonathan J Hunt ec682539da Optimized memory usage in TextConverter by ignoring all drawing commands. 2011-03-07 15:11:31 +10:00
Yusuke Shinyama 4918d59bc2 disable caching support 2011-03-03 00:04:43 +09:00
Yusuke Shinyama 18e782f330 canonicalize package names 2011-03-02 23:43:03 +09:00
Yusuke Shinyama bb26cf9180 eliminate empty textboxes 2011-03-01 20:47:20 +09:00
Yusuke Shinyama dfd621b98c minor bugfix. thanks to Hiroshi Manabe. 2011-02-28 19:50:07 +09:00
Yusuke Shinyama f22b056454 release-20110227 2011-02-27 19:53:12 +09:00
Yusuke Shinyama a8bf9b159e docstring fix 2011-02-27 13:09:12 +09:00
Yusuke Shinyama cabaa10e4f layout analysis improvement 2011-02-27 12:56:28 +09:00
Yusuke Shinyama 7dbb664db3 code cleanup and more debugging options 2011-02-14 23:42:05 +09:00
Yusuke Shinyama f00f1dbd04 better layout analysis 2011-02-14 23:41:23 +09:00
Yusuke Shinyama b2d13db29a code cleanup 2011-02-14 22:51:20 +09:00
Yusuke Shinyama cd412308bd text flow detection bug fix (thanks to fujimoto-san) 2011-02-14 22:32:55 +09:00
Yusuke Shinyama cbd58121e3 fix aggressive vertical writing detection (which ruins layout) 2011-02-02 23:09:34 +09:00
Yusuke Shinyama 109aedeb43 cfffont extension with no luck 2011-01-25 00:19:07 +09:00
Yusuke Shinyama 4eb6083c09 code cleanup 2011-01-03 18:11:22 +09:00
Yusuke Shinyama 16b2a87b24 CMAP_PATH environment variable support 2011-01-03 18:11:16 +09:00
Yusuke Shinyama 420169a692 release 20101226 2010-12-26 19:06:47 +09:00
Yusuke Shinyama a24c452ba2 boxes_flow patch by Daniel Gerber 2010-12-26 17:26:39 +09:00
Yusuke Shinyama 3da3adad9b method renamed: finish(self) -> analyze(self, laparams). 2010-12-26 16:56:21 +09:00
yusuke.shinyama.dummy 84ed94aec0 another bugfix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@281 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:41:03 +00:00
yusuke.shinyama.dummy 9bba7ac08b oops, forgot to fix this
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@280 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:40:58 +00:00
yusuke.shinyama.dummy f4ced29713 bugfix by Kevin Brubeck Unhammer
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@278 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:40:45 +00:00
yusuke.shinyama.dummy 2bf9c23801 check_extractable paramater added
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@276 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:28 +00:00
yusuke.shinyama.dummy 9f78915ea6 show cid for unknown characters
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@275 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:19 +00:00
yusuke.shinyama.dummy 7374b81383 htmlconverter improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@274 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 15:04:28 +00:00
yusuke.shinyama.dummy fb4ce96309 add font-family
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@273 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:50 +00:00
yusuke.shinyama.dummy 476ecf7e32 add html exect layout mode; default changed.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@272 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:41 +00:00
yusuke.shinyama.dummy 08c5c66917 add debugging features
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@271 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:34 +00:00
yusuke.shinyama.dummy 434b24b6e5 remove unused method
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@270 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:27 +00:00
yusuke.shinyama.dummy 0d1f00fa9b improved layout analysis for vertical script
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@269 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:14 +00:00
yusuke.shinyama.dummy 9584845358 layout analysis improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@268 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:05 +00:00
yusuke.shinyama.dummy edbd3764a7 html layout output fix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@267 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:48 +00:00
yusuke.shinyama.dummy 1904b61355 documentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@266 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:40 +00:00
yusuke.shinyama.dummy 1a25c61a9f fix empty hexstring bug and test cases.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@265 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-27 12:29:00 +00:00
yusuke.shinyama.dummy 509ab66319 stay with python2
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy 438b4953be documentation bit and code cleanup
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@263 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:49 +00:00
yusuke.shinyama.dummy 71863aec67 minor bugfix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@262 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:43 +00:00
yusuke.shinyama.dummy 6a4b70f54a code cleanup
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@261 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:38 +00:00
yusuke.shinyama.dummy 98442ed943 update the version number and documentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@256 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:58 +00:00
yusuke.shinyama.dummy cc139db8a7 bugfix LTChar.is_vertical undefined. verticality is now handled by LTTextBox
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@254 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:23 +00:00
yusuke.shinyama.dummy 21f6cf8fb6 removed PDFStream.decomp(). turned out zlib can handle trailing bytes.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@253 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:18 +00:00
yusuke.shinyama.dummy 0ecd0b8f9d attempt to recover encoding info from texfont
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@252 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:12 +00:00
yusuke.shinyama.dummy afe33312c6 outline bug fixed
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@249 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:14:52 +00:00
yusuke.shinyama.dummy 0b962443ed patch by Alexander Garden
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@248 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:14:46 +00:00
yusuke.shinyama.dummy 69d9d85685 nunpack TypeError fix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@246 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:52 +00:00
yusuke.shinyama.dummy 3305c07ba2 layout analysis improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@245 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:39 +00:00
yusuke.shinyama.dummy bc1303e901 layout analysis improvement 1
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@244 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:33 +00:00
yusuke.shinyama.dummy 3b2aabaa10 version bump
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@243 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 07:00:01 +00:00
yusuke.shinyama.dummy 0944cfaded test file simple3.pdf added.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@240 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:41 +00:00
yusuke.shinyama.dummy 83d2086f19 fix minor layout issue
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@239 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:31 +00:00
yusuke.shinyama.dummy b871331659 improvement in fallback
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@238 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:24 +00:00
yusuke.shinyama.dummy 4554705881 glyphlist bug (due to my misunderstanding of spec.)
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@237 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-26 15:02:46 +00:00
yusuke.shinyama.dummy ac74542d1f minor bugfixes
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@234 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-26 15:02:29 +00:00
yusuke.shinyama.dummy 1a8692124f version bump
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@233 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 04:31:12 +00:00
yusuke.shinyama.dummy 2d02833936 release 20100619
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@230 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 03:58:20 +00:00
yusuke.shinyama.dummy f5aff374fc some wordings and documentations
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@229 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 03:56:50 +00:00
yusuke.shinyama.dummy a0dd46bd8e cmap compression patch. thanks to Jakub Wilk
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@228 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-13 13:50:24 +00:00
yusuke.shinyama.dummy 3f831c8104 bugfixes. thanks to Jakub Wilk
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@226 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-13 04:02:30 +00:00
yusuke.shinyama.dummy 702f3088ae unittest failure fix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@222 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-06 05:16:29 +00:00
yusuke.shinyama.dummy cf52476f5e remove redundancy
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@221 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-06 05:16:21 +00:00
yusuke.shinyama.dummy fe3bdbfce0 text rise support added
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@217 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-05-18 14:57:04 +00:00
yusuke.shinyama.dummy 8e92ddca30 latin2ascii.py was moved as a utility
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@215 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-05-05 05:51:11 +00:00
yusuke.shinyama.dummy 7f587cafec some usage document added
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@214 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 13:31:31 +00:00
yusuke.shinyama.dummy eb535d4106 change PDFPageAggregator -> PDFLayoutAnalyzer
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@213 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 13:31:21 +00:00
yusuke.shinyama.dummy 833f859449 move TagExtractor
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@212 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 13:31:11 +00:00
yusuke.shinyama.dummy a16eba30b7 release 20100424
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@210 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 04:32:21 +00:00
yusuke.shinyama.dummy 97848409e5 fix xobject resources bug, thanks to Jose Maria
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@209 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 04:32:03 +00:00
yusuke.shinyama.dummy 9052cd1ea7 better TOC extraction
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@207 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 01:34:18 +00:00
yusuke.shinyama.dummy e77a6ba997 -A (all_texts) option added for layout analysis
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@205 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:30:03 +00:00
yusuke.shinyama.dummy 609c6e1f5f rename: LayoutItem -> LTItem, LayoutContainer -> LTContainer
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@203 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:29:30 +00:00
yusuke.shinyama.dummy c81142aa44 image handling addition (untested)
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@202 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:05:02 +00:00
yusuke.shinyama.dummy 71defb2272 documentation bit, ready for release-20100327
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@198 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-27 06:06:09 +00:00
yusuke.shinyama.dummy 5f822f6dcb improved layout analysis.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@197 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-26 11:11:35 +00:00
yusuke.shinyama.dummy 2e5b92c18a writing mode detection
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@196 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-25 11:38:47 +00:00
yusuke.shinyama.dummy e536b3ef11 more bugfixes.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@194 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-23 10:29:52 +00:00
yusuke.shinyama.dummy ee34d8d549 bugfix (thanks to Brian Berry).
Remaining TODOs: automatic testing for vertical texts. Various layout analysis tuning.


git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@193 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 08:36:39 +00:00
yusuke.shinyama.dummy 25636d7c08 release-20100322
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@192 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 06:22:33 +00:00
yusuke.shinyama.dummy 40b36a7c42 consistent test results
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@191 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 06:04:54 +00:00
yusuke.shinyama.dummy a6523d1a9a patch from pietvo.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@190 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 04:46:59 +00:00
yusuke.shinyama.dummy fa13122f09 add regression tests.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@189 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 04:34:52 +00:00
yusuke.shinyama.dummy cd39642abe code cleanup
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@188 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 04:00:18 +00:00
yusuke.shinyama.dummy e01cb43e31 add novel layout analysis
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@187 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-21 02:21:37 +00:00
yusuke.shinyama.dummy ffaaea0bac layout analysis changed drastically.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@186 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-20 05:43:34 +00:00
yusuke.shinyama.dummy 85c5476623 A couple of bugfixes. Thanks to Sean Manefield.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@185 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-12 13:47:39 +00:00
yusuke.shinyama.dummy 23be96c49e CAUTION! changed the way of internal layout handling.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@184 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-27 03:59:25 +00:00
yusuke.shinyama.dummy 2555b38836 fix typos (patches by sm)
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@183 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-15 14:50:19 +00:00
yusuke.shinyama.dummy aad921b382 version bump.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@182 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-13 15:02:34 +00:00
yusuke.shinyama.dummy 2dee2efad9 apply more patches
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@181 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-13 15:00:43 +00:00
yusuke.shinyama.dummy 0424fd8dc9 incorporated some patches by Andre Auzi
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@180 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-07 15:11:24 +00:00
yusuke.shinyama.dummy 538a605ac0 several bugfixes.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@179 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-07 03:14:00 +00:00
yusuke.shinyama.dummy 63033599ce release-20100131
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@178 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-31 02:13:30 +00:00
yusuke.shinyama.dummy dda60dcafc integrate TODO html.
reorder the code bit.


git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@177 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-31 02:12:51 +00:00
yusuke.shinyama.dummy 0f8fe3f19e Page rotation bug fixed.
Various minor fixes.


git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@176 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-31 02:09:28 +00:00
yusuke.shinyama.dummy dc6e5c366d jpeg extraction support added.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@174 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-30 07:30:01 +00:00
yusuke.shinyama.dummy a63d0324ed version bump
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@171 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-04 12:50:59 +00:00
yusuke.shinyama.dummy ef93c4ee75 convert to doctest
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@170 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-04 12:41:23 +00:00
yusuke.shinyama.dummy 98c8367339 warning removal.
code cleanup.
cmap bug fixed.


git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@168 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-01 03:09:26 +00:00
yusuke.shinyama.dummy 7093bdbdfa Added RunLengthDecode filter by Troy Bollinger.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@167 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-12-24 11:51:43 +00:00
yusuke.shinyama.dummy 6590ad42f5 experimental polygon extraction.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@166 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-12-20 02:38:01 +00:00
yusuke.shinyama.dummy fb05e4b990 for release 20091219
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@164 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-12-19 15:10:58 +00:00
yusuke.shinyama.dummy c07bef376d pycdb not needed anymore.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@163 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-12-19 14:21:25 +00:00
yusuke.shinyama.dummy e4b089e327 include cmap
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@162 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-12-19 14:17:00 +00:00
yusuke.shinyama.dummy ed8a5362b9 renamed cmap.py -> cmapdb.py (avoiding future name changes)
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@161 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-12-19 06:52:02 +00:00
yusuke.shinyama.dummy 4d905b81b7 release-20091129
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@160 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-29 07:17:36 +00:00
yusuke.shinyama.dummy 2af8eeb3e7 Add a bit of documentation.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@159 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-15 02:42:05 +00:00
yusuke.shinyama.dummy 0298e26acc speed-tweak.diff from Yannick Gingras
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@158 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-14 11:29:40 +00:00
yusuke.shinyama.dummy faa775897c another bugfix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@156 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-07 09:01:11 +00:00
yusuke.shinyama.dummy d260967d12 use of keyword_name instead of directly accessing obj.name
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@155 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-07 00:58:02 +00:00
yusuke.shinyama.dummy ddb78e2698 abbreviation PSLiteralTable.intern -> LIT, PSKeywordTable.intern -> KWD
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@154 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-07 00:55:18 +00:00
yusuke.shinyama.dummy f444c88e3d testing against None with "is", not using "=="
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@153 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-06 15:10:29 +00:00
yusuke.shinyama.dummy 626e36f39c fix typo (pointed by JaredU)
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@152 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-06 15:06:59 +00:00
yusuke.shinyama.dummy 6bc2bebb5b More docstrings.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@151 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-04 11:28:32 +00:00
yusuke.shinyama.dummy 827c606f82 more docstrings.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@150 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-04 10:35:16 +00:00
yusuke.shinyama.dummy b0c6068da1 Added docstring by Yannick Gingras.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@149 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-04 09:48:26 +00:00
yusuke.shinyama.dummy 77986b8273 fix CMapDB initialization stuff. more code cleanup.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@148 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-03 13:39:34 +00:00
yusuke.shinyama.dummy 3dd4f1668b source code tidy up
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@147 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-03 01:27:30 +00:00
yusuke.shinyama.dummy 78f7866554 sgml to xml
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@146 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-31 03:04:56 +00:00
yusuke.shinyama.dummy 736a69a4cd password encryption (R2) bug
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@144 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-31 01:41:30 +00:00
yusuke.shinyama.dummy 7790808560 to 4-space indentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-24 04:41:59 +00:00
yusuke.shinyama.dummy e8b1309e76 testcase added
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@140 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-24 02:50:07 +00:00
yusuke.shinyama.dummy a1591f6a4d charspace bug fixed.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@139 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-23 14:51:40 +00:00
yusuke.shinyama.dummy ee97f18d4e release-20091004
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@138 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-04 03:48:11 +00:00
yusuke.shinyama.dummy 2ed6b90551 text rotation handling fixed.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@137 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-03 04:36:54 +00:00
yusuke.shinyama.dummy 6a2cb8148d git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@136 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-09-16 13:46:41 +00:00
yusuke.shinyama.dummy ab425ddb8f git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@135 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-09-16 13:45:23 +00:00
yusuke.shinyama.dummy 3f93fbcefc bugfixes
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@134 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-09-16 12:51:11 +00:00
yusuke.shinyama.dummy 5b02461c6d 20090912
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@133 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-09-12 03:05:49 +00:00
yusuke.shinyama.dummy 3da04c0a04 rectangle handling bug fixed
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@132 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-09-12 02:37:47 +00:00
yusuke.shinyama.dummy 3f18a74e9c fontsize now referring to bbox
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@131 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-09-07 14:25:15 +00:00
yusuke.shinyama.dummy 68e02b57af release 20090830
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@130 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-08-30 01:23:00 +00:00
yusuke.shinyama.dummy b8c6cb8367 git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@129 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-08-26 15:20:44 +00:00
yusuke.shinyama.dummy 16ddc94c77 git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@128 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-08-25 14:08:27 +00:00
yusuke.shinyama.dummy c813854ca2 git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@126 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-08-24 06:54:28 +00:00
yusuke.shinyama.dummy 5306109a0a git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@125 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-07-23 15:27:29 +00:00
yusuke.shinyama.dummy 585dd59b70 git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@124 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-07-23 14:03:58 +00:00
yusuke.shinyama.dummy 57025ee632 git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@122 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-07-21 16:06:50 +00:00
yusuke.shinyama.dummy 9093c340af 20090721
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@121 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-07-21 14:23:23 +00:00
yusuke.shinyama.dummy 8a5bec5065 layout analysis improved.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@120 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-07-21 07:55:19 +00:00
yusuke.shinyama.dummy af63784305 release-20090711
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@118 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-07-11 15:28:12 +00:00