Commit Graph

512 Commits (08cb217983d09ee5bcba80e918299b70c60d5df0)

Author SHA1 Message Date
Cathal Garvey 08cb217983 Progress, progress.. not nearly atomic enough, sorry. 2015-05-30 16:14:24 +01:00
Cathal Garvey 1b47bed306 Many changes to make pdf2txt.py work better in Py3, some in that script, others in module!
Sorry, changes should have been more atomic.

*In pdf2txt.py:*

* Re-wrote main function to use argparse instead of optparse.
* Manually tested in Py2/Py3 to get partial consistency.
* Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway.
* Py2 mode *probably* unchanged, cannot find any bugs yet...
* Kept old main function for posterity, for now.

*In utils:*

* Added a few compatibility functions (some string hax required chardet, new dependency):
    - make_compat_bytes(in_str)-> (py3->bytes | py2->str)
    - make_compat_str(in_str)-> (str)
    - compatible_encode_method(bytesorstring, encoding, erraction)-> (str)

*In pdfdevice:*

* To handle different output filetypes in Py3, injected lots of calls to new utils methods,
  as well as some six.PYX checks and logic. These changes are largely responsible for
  enhanced Py2/Py3 consistency.

*In converter:*

* To handle output filetypes in Py2, injected a few checks and fixes particularly around the
  py2 `str.encode` method and its *assumed* usual use-analogies in Py3.
2015-05-17 21:08:57 +01:00
Philippe Guglielmetti 448aa08bc4 Merge pull request #4 from enkore/master
Fix utils.decode_text
2014-12-05 09:58:58 +01:00
enkore d0379a2c44 Fix utils.decode_text 2014-12-04 17:09:52 +01:00
Philippe Guglielmetti 0e40264071 Merge pull request #3 from Cybjit/master
Samples and latin1 passwords
2014-09-17 07:22:52 +02:00
cybjit 515687e1bb more xrange to range 2014-09-16 23:17:31 +02:00
cybjit 2639b15ef4 guess argv encoding in py2 using sys.stdin.encoding 2014-09-16 23:17:26 +02:00
cybjit 9b2e29396b apply_png_predictor py3 2014-09-16 22:59:29 +02:00
cybjit ad05121c69 password py3 2014-09-16 22:59:00 +02:00
cybjit 14585987c3 keep password api unicode, latin1 or utf-8 is encoded in handler 2014-09-16 22:58:25 +02:00
cybjit 2260f77b19 fix dict_value usage in strict mode 2014-09-16 22:57:29 +02:00
cybjit 51a361c145 clean up HTMLConverter and XMLConverter encoding 2014-09-16 22:57:00 +02:00
cybjit 2ee7153f6e add python3 in sample Makefile 2014-09-16 22:56:13 +02:00
Goulu f577f76c52 renamed as pdfminer.six in PyPi 2014-09-15 11:10:00 +02:00
Goulu 03de0f4db8 forgot 'six' requirement ... 2014-09-15 10:42:08 +02:00
Goulu 8861d7e0ed version 20140915 pushed to PyPi as pdfminer_six 2014-09-15 10:33:04 +02:00
Philippe Guglielmetti 4f8aa9ff5b Merge pull request #2 from Cybjit/master
CMap fixes and speed improvements
2014-09-12 07:33:06 +02:00
cybjit 714423883c setup logging for pdf2txt and fix dumppdf 2014-09-12 00:29:31 +02:00
cybjit 39942b6642 avoid string formating when not logging 2014-09-12 00:29:31 +02:00
cybjit 01821c7d1e rename bytes to avoid built-in collision 2014-09-12 00:29:31 +02:00
cybjit 31e6afc7cf faster and simpler bytes implementation 2014-09-12 00:29:30 +02:00
cybjit ed13f7c47d conv_cmap py3 compat 2014-09-12 00:29:30 +02:00
cybjit cba5a42ba8 decipher_all bytes 2014-09-12 00:29:30 +02:00
cybjit 6357e2da80 code2cid uses int, not byte 2014-09-12 00:29:27 +02:00
cybjit 9b0a3ee53e decode cmap font name 2014-09-11 23:30:02 +02:00
Philippe Guglielmetti 7b620b3146 Merge pull request #1 from Cybjit/master
Python 3 text conversion issues
2014-09-09 20:42:37 +02:00
cybjit a6f31a713d cmap bytes and decode 2014-09-07 18:41:04 +02:00
cybjit cc733c8217 fixes for ARC4 2014-09-07 18:38:22 +02:00
cybjit f9a67db89b change xrange to range 2014-09-07 18:36:12 +02:00
cybjit 0a2d90c051 pdf2txt: do not double encode stdout 2014-09-07 18:34:11 +02:00
unknown 28c2a4e6ad 2.7/3.4 encoding corrected 2014-09-04 10:31:33 +02:00
unknown 58b8492783 no logging in travis.ci 2014-09-04 10:19:50 +02:00
unknown 1c93468c7e faster, less verbose tests 2014-09-04 10:02:29 +02:00
unknown 7b610b34be tools must be a module to enable scripts tests 2014-09-04 09:47:33 +02:00
unknown 4ab48d1803 Python 3.4 compatibility + tests 2014-09-04 09:36:19 +02:00
unknown 29c07ea770 Python 3.4 support and tests 2014-09-03 15:26:08 +02:00
unknown a6475b61b4 Python 3.4 support added and tested 2014-09-03 13:17:41 +02:00
unknown 846cd18186 Python 3.4 support 2014-09-02 15:49:46 +02:00
unknown faea7291a8 tests pass under Py 2.7 and 3.4 2014-09-01 14:16:49 +02:00
Yusuke Shinyama b0e035c24f Style fix: always have an explicit return. 2014-07-15 21:38:29 +09:00
Yusuke Shinyama f5b5e31921 Fixed: DecodeParms array support. 2014-07-09 19:07:27 +09:00
Yusuke Shinyama 137fc3a1ae Use KWD instead of token.name. 2014-06-30 19:15:21 +09:00
Yusuke Shinyama 1ccfaff411 String-Bytes distinction (first attempt). 2014-06-30 19:05:56 +09:00
Yusuke Shinyama 8791355e1d Cleanup imports. Use relative imports. 2014-06-26 18:12:39 +09:00
Yusuke Shinyama 2e900e5d10 Fixed for consistent test results. (hopefully...) 2014-06-26 17:41:31 +09:00
Yusuke Shinyama fe86b4e64e Changed: StringIO -> io.BytesIO 2014-06-25 19:55:41 +09:00
Yusuke Shinyama a3ab6c253b Fixed: loose autotesting. 2014-06-25 19:50:20 +09:00
Yusuke Shinyama 107e071508 Drop Python 2.4 support. The oldest supported version is now Python 2.6. 2014-06-25 19:28:54 +09:00
Yusuke Shinyama 44074b42ea Added: stripcontrol for XMLConverter (-S option) 2014-06-22 00:33:00 +09:00
Yusuke Shinyama 81391c09f4 Fixed: #56 (with a derpy fix) 2014-06-18 19:11:45 +09:00