Commit Graph

153 Commits (599f0391b5f0f75cd72adf61a5d9db74045ba828)

Author SHA1 Message Date
unknown 29c07ea770 Python 3.4 support and tests 2014-09-03 15:26:08 +02:00
unknown a6475b61b4 Python 3.4 support added and tested 2014-09-03 13:17:41 +02:00
Yusuke Shinyama fe86b4e64e Changed: StringIO -> io.BytesIO 2014-06-25 19:55:41 +09:00
Yusuke Shinyama 44074b42ea Added: stripcontrol for XMLConverter (-S option) 2014-06-22 00:33:00 +09:00
Yusuke Shinyama bb866ae148 Changed: new except syntax (2.6 or above). 2014-06-16 18:50:07 +09:00
Yusuke Shinyama 28e96ba3d0 Use print as a function. 2014-06-15 12:14:33 +09:00
Yusuke Shinyama 1384a3fe8d Code cleanup: removed some debug flags. 2014-06-14 15:43:10 +09:00
Yusuke Shinyama 17b9b19a26 Fixed for newer version: pdf2html.cgi 2014-04-02 18:54:50 +09:00
Yusuke Shinyama 340387bfc6 Cleanup: isinstance 2014-03-28 17:50:59 +09:00
Yusuke Shinyama f9079e4c0a Fixed dumppdf.py issues. 2014-03-24 20:55:00 +09:00
Yusuke Shinyama bb6f9b6fc9 Added: -R option. 2013-11-25 18:21:19 +09:00
Alex Rothberg af8c4a6b8f - only visit each objid once when dumping all objects 2013-11-18 20:41:09 -05:00
Yusuke Shinyama 2b56b2eedf Merged. 2013-11-07 19:50:41 +09:00
Matthew Duggan c1da8b835c PEP8: Remove trailing whitespace 2013-11-07 16:14:53 +09:00
Matthew Duggan 10a68c83bd Remove unused imports identified by pyflakes 2013-11-07 16:09:44 +09:00
Yusuke Shinyama d3730a29ec API change: process_pdf -> PDFPage.get_pages 2013-10-22 18:59:16 +09:00
Yusuke Shinyama 8a70a9f657 fixed: encoding problem with vertical characters. 2013-10-22 18:44:40 +09:00
Yusuke Shinyama 32844507ea Fixed some style issues. 2013-10-19 08:41:01 +09:00
Yusuke Shinyama 28cb424f8f Merge pull request #21 from eug48/master
dumppdf: support for extracting embedded files using the -E option
2013-10-18 16:23:09 -07:00
Yusuke Shinyama 6ca9ac5434 chmod fix. 2013-10-17 23:06:07 +09:00
Yusuke Shinyama 0ea08890d4 renamed: python2 -> python. 2013-10-17 23:05:27 +09:00
Yusuke Shinyama 6ad82e355c Beating the codepage dragon. 2013-10-17 22:57:48 +09:00
Yusuke Shinyama 774827b4ce Code cleanup: conv_cmap.py 2013-10-12 13:20:40 +09:00
Yusuke Shinyama f85c374cae Separated PDFPage to pdfpage.py. 2013-10-10 19:54:55 +09:00
Yusuke Shinyama c926874d20 API Change: the PDFDocument cstr now takes PDFParser. set_parser() is removed. 2013-10-10 18:40:06 +09:00
Yusuke Shinyama 2221163b94 Split pdfparser.py and pdfdocument.py. 2013-10-10 18:29:30 +09:00
Yusuke Shinyama 1467fc674c Added fallback for broken PDFs. 2013-10-09 22:45:54 +09:00
Yusuke Shinyama 06425bba00 Introducing PDFObjectNotFound 2013-10-09 21:39:23 +09:00
eug 925845b172 dumppdf: support for extracting embedded files using the -E option 2013-01-20 13:29:35 +10:00
Yusuke Shinyama 82ff98c7b3 imagewriter now works with text output 2011-11-07 01:15:10 +10:00
Yusuke Shinyama dc8fde0e47 added CCITTFaxFilter support and a very crude image extraction. 2011-07-18 21:07:00 +10:00
Yusuke Shinyama fcf0d74ecc tweaks for debugging 2011-04-21 22:07:52 +09:00
Yusuke Shinyama 4918d59bc2 disable caching support 2011-03-03 00:04:43 +09:00
Yusuke Shinyama 7dbb664db3 code cleanup and more debugging options 2011-02-14 23:42:05 +09:00
Yusuke Shinyama cbd58121e3 fix aggressive vertical writing detection (which ruins layout) 2011-02-02 23:09:34 +09:00
Yusuke Shinyama d3bcc0eef5 another minor fix 2010-12-26 19:30:46 +09:00
Yusuke Shinyama a24c452ba2 boxes_flow patch by Daniel Gerber 2010-12-26 17:26:39 +09:00
Yusuke Shinyama bf44e52cf7 merged 2010-12-25 17:54:17 +09:00
yusuke.shinyama.dummy 866f2bbb75 webapp fixed
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@283 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:41:35 +00:00
yusuke.shinyama.dummy 5d98a27d9c test cases updated
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@282 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:41:11 +00:00
Yusuke Shinyama 432b3829d3 test cases updated 2010-12-24 22:30:25 +09:00
yusuke.shinyama.dummy 2bf9c23801 check_extractable paramater added
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@276 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:28 +00:00
yusuke.shinyama.dummy 7374b81383 htmlconverter improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@274 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 15:04:28 +00:00
yusuke.shinyama.dummy 509ab66319 stay with python2
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy afe33312c6 outline bug fixed
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@249 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:14:52 +00:00
yusuke.shinyama.dummy ca5588a702 bugfix by Humberto Pereira
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@241 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:59:50 +00:00
yusuke.shinyama.dummy 4554705881 glyphlist bug (due to my misunderstanding of spec.)
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@237 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-26 15:02:46 +00:00
yusuke.shinyama.dummy a0dd46bd8e cmap compression patch. thanks to Jakub Wilk
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@228 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-13 13:50:24 +00:00
yusuke.shinyama.dummy f9c9357547 pdf2html.cgi code cleanup
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@218 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-05-29 11:51:15 +00:00
yusuke.shinyama.dummy 8e92ddca30 latin2ascii.py was moved as a utility
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@215 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-05-05 05:51:11 +00:00
yusuke.shinyama.dummy eb535d4106 change PDFPageAggregator -> PDFLayoutAnalyzer
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@213 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 13:31:21 +00:00
yusuke.shinyama.dummy 32d65b70f8 trivial change
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@211 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 13:31:03 +00:00
yusuke.shinyama.dummy 97848409e5 fix xobject resources bug, thanks to Jose Maria
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@209 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 04:32:03 +00:00
yusuke.shinyama.dummy 9052cd1ea7 better TOC extraction
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@207 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 01:34:18 +00:00
yusuke.shinyama.dummy e77a6ba997 -A (all_texts) option added for layout analysis
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@205 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:30:03 +00:00
yusuke.shinyama.dummy 2e5b92c18a writing mode detection
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@196 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-25 11:38:47 +00:00
yusuke.shinyama.dummy ee34d8d549 bugfix (thanks to Brian Berry).
Remaining TODOs: automatic testing for vertical texts. Various layout analysis tuning.


git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@193 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 08:36:39 +00:00
yusuke.shinyama.dummy 2555b38836 fix typos (patches by sm)
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@183 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-15 14:50:19 +00:00
yusuke.shinyama.dummy 2dee2efad9 apply more patches
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@181 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-13 15:00:43 +00:00
yusuke.shinyama.dummy 538a605ac0 several bugfixes.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@179 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-07 03:14:00 +00:00
yusuke.shinyama.dummy 0f8fe3f19e Page rotation bug fixed.
Various minor fixes.


git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@176 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-31 02:09:28 +00:00
yusuke.shinyama.dummy dc6e5c366d jpeg extraction support added.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@174 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-30 07:30:01 +00:00
yusuke.shinyama.dummy a9d7a00ccd trivial grammar errors
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@173 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-10 07:18:05 +00:00
yusuke.shinyama.dummy 9486303103 pdf2html.cgi
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@169 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-01 14:15:25 +00:00
yusuke.shinyama.dummy 98c8367339 warning removal.
code cleanup.
cmap bug fixed.


git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@168 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-01 03:09:26 +00:00
yusuke.shinyama.dummy fb05e4b990 for release 20091219
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@164 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-12-19 15:10:58 +00:00
yusuke.shinyama.dummy e4b089e327 include cmap
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@162 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-12-19 14:17:00 +00:00
yusuke.shinyama.dummy ed8a5362b9 renamed cmap.py -> cmapdb.py (avoiding future name changes)
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@161 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-12-19 06:52:02 +00:00
yusuke.shinyama.dummy 61d4872c3a add -n option to pdf2txt.py
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@157 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-07 09:12:54 +00:00
yusuke.shinyama.dummy faa775897c another bugfix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@156 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-07 09:01:11 +00:00
yusuke.shinyama.dummy f444c88e3d testing against None with "is", not using "=="
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@153 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-06 15:10:29 +00:00
yusuke.shinyama.dummy 77986b8273 fix CMapDB initialization stuff. more code cleanup.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@148 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-03 13:39:34 +00:00
yusuke.shinyama.dummy 78f7866554 sgml to xml
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@146 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-31 03:04:56 +00:00
yusuke.shinyama.dummy 23b8058ad4 outfp closing bug fixed
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@145 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-31 02:09:36 +00:00
yusuke.shinyama.dummy 7790808560 to 4-space indentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-24 04:41:59 +00:00
yusuke.shinyama.dummy 8a5bec5065 layout analysis improved.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@120 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-07-21 07:55:19 +00:00
yusuke.shinyama.dummy 787ae4f814 documentation fix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@117 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-07-11 12:42:12 +00:00
yusuke.shinyama.dummy 97dd4dda5e improved clustering
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@116 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-06-20 10:44:00 +00:00
yusuke.shinyama.dummy c7a0894182 auto detect output type
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@115 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-06-20 10:00:51 +00:00
yusuke.shinyama.dummy 8cae56a555 documentation fix.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@108 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-05-17 06:21:08 +00:00
yusuke.shinyama.dummy 173d095522 text spacing bug fixed
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@106 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-05-16 10:42:35 +00:00
yusuke.shinyama.dummy 3e12268bf6 rename package pdflib -> pdfminer.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@103 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-05-16 06:12:01 +00:00
yusuke.shinyama.dummy f628c0d3fe git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@101 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-05-15 14:34:53 +00:00
yusuke.shinyama.dummy 43e5c05307 handle error when an object was not found in dumpxml()
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@92 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-04-26 15:03:47 +00:00
yusuke.shinyama.dummy 6d91453187 text positioning got right.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@87 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-04-18 17:15:49 +00:00
yusuke.shinyama.dummy f8510edffc AsciiHexDecode filter patch incorporated. Thanks to Troy Bollinger.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@86 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-04-08 10:55:01 +00:00
yusuke.shinyama.dummy d11012d9f7 delete unused file
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@85 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-04-08 10:37:13 +00:00
yusuke.shinyama.dummy 162c5f0bfa webapp fixed
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@83 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-04-02 14:24:57 +00:00
yusuke.shinyama.dummy 70e42bff04 encoding bug fixed.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@74 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-03-24 16:26:59 +00:00
yusuke.shinyama.dummy b432a3f4ae patch from Troy Bollinger.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@71 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-02-28 05:44:08 +00:00
yusuke.shinyama.dummy 91770edd46 foo
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@59 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-01-10 09:25:03 +00:00
yusuke.shinyama.dummy 24bdd33557 various bugfixes
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@56 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-01-05 04:40:50 +00:00
yusuke.shinyama.dummy 71be16febe wordspace handling improved.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@55 1aa58f4a-7d42-0410-adbc-911cccaed67c
2008-12-25 15:09:54 +00:00
yusuke.shinyama.dummy 33f709a0d8 page number bug fix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@54 1aa58f4a-7d42-0410-adbc-911cccaed67c
2008-09-11 14:57:06 +00:00
yusuke.shinyama.dummy 3e5ab3e01b pdf2html webapp added.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@52 1aa58f4a-7d42-0410-adbc-911cccaed67c
2008-09-06 04:51:01 +00:00
yusuke.shinyama.dummy 5d787e9ece outfp unnecessary
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@50 1aa58f4a-7d42-0410-adbc-911cccaed67c
2008-09-06 04:15:51 +00:00
yusuke.shinyama.dummy 649651a174 separate page handling.
version bump up.


git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@49 1aa58f4a-7d42-0410-adbc-911cccaed67c
2008-08-30 12:47:21 +00:00
yusuke.shinyama.dummy 395a8dc062 tagged pdf extraction supported.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@45 1aa58f4a-7d42-0410-adbc-911cccaed67c
2008-07-27 04:30:37 +00:00
yusuke.shinyama.dummy 9740f26cec outline (TOC) extraction supported.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@42 1aa58f4a-7d42-0410-adbc-911cccaed67c
2008-07-09 15:15:32 +00:00
yusuke.shinyama.dummy cb02051481 several bugfixes.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@41 1aa58f4a-7d42-0410-adbc-911cccaed67c
2008-07-03 15:51:44 +00:00