pdfminer.six

Commit Graph

Author	SHA1	Message	Date
Guglielmetti Philippe	6d3210d206	pdfdiff tool (and .spec files for compilation with pyinstaller)	2017-11-21 10:48:45 +01:00
Attila Szász	938419c476	Align dumppdf tool to modified data structures. (#73 ) * Align dumppdf tool to modified data structures. TOC page numbers should also work now, counting from 1. * Update version number.	2017-07-20 20:46:11 +02:00
Hugh Secker-Walker	35a58ee5b5	Add tools/pdfstats.py which counts all LT* types in a PDF (#68 )	2017-05-29 09:11:58 +02:00
Hugh Secker-Walker	488545ddc7	Add string expressions to asserts showing local data (#67 )	2017-05-29 09:06:09 +02:00
Philippe Guglielmetti	52feb22eeb	Merge remote-tracking branch 'origin/master' Conflicts: MANIFEST.in README.md pdfminer/latin_enc.py pdfminer/pdfdocument.py pdfminer/pdfinterp.py pdfminer/pdfpage.py pdfminer/pdftypes.py pdfminer/psparser.py pdfminer/utils.py samples/Makefile setup.py	2017-01-19 08:03:16 +01:00
Antonio Ercole De Luca	0fdebc6739	Removing all the "#!/usr/bin/env python" lines, they do not need for … (#34 ) * Removing all the "#!/usr/bin/env python" lines, they do not need for python3, solving issue number: #19. * Restored all the shebangs in the tools and tests folders (because they are real executables) but used "#!/usr/bin/env python" instead of "#!/usr/bin/python" as this blog points out: https://www.peterbe.com/plog/importance-of-env Removed also the shebang from pdfminer/psparser.py file.	2016-11-08 20:01:11 +01:00
Jakub Wilk	5ddbecb551	Fix typos	2016-09-13 16:25:09 +02:00
Friedrich Lindenberg	1d54ecd31c	Make the logger run in a namespace.	2016-05-20 21:12:05 +02:00
Ivan Teoh	2c8f226907	Fix issues #20 - NameError: global name 'ImageWriter' is not defined	2016-04-26 12:38:42 +10:00
Chris Hager	2e1be5721f	removed settings.ENFORCE_CHECK_EXTRACTABLE	2015-11-01 22:34:18 +01:00
Chris Hager	b686dd0139	pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE	2015-11-01 22:28:08 +01:00
Cathal Garvey	268e9fb2bd	Removed typechecking, nothing's exploded yet and argparse does lots of heavy lifting already.	2015-05-30 17:05:28 +01:00
Cathal Garvey	b3553cef10	Cleaning up pdf2txt.py after the partition/move.	2015-05-30 17:03:55 +01:00
Cathal Garvey	cbe270a4bf	Killed the old main function for pdf2txt.py	2015-05-30 16:37:22 +01:00
Cathal Garvey	ead8e778a6	Successfully compartmentalised code, getting closer to moving pdf->text as a module function.	2015-05-30 16:27:58 +01:00
Cathal Garvey	08cb217983	Progress, progress.. not nearly atomic enough, sorry.	2015-05-30 16:14:24 +01:00
Cathal Garvey	1b47bed306	Many changes to make pdf2txt.py work better in Py3, some in that script, others in module! Sorry, changes should have been more atomic. In pdf2txt.py: * Re-wrote main function to use argparse instead of optparse. * Manually tested in Py2/Py3 to get partial consistency. * Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway. * Py2 mode probably unchanged, cannot find any bugs yet... * Kept old main function for posterity, for now. In utils: * Added a few compatibility functions (some string hax required chardet, new dependency): - make_compat_bytes(in_str)-> (py3->bytes \| py2->str) - make_compat_str(in_str)-> (str) - compatible_encode_method(bytesorstring, encoding, erraction)-> (str) In pdfdevice: * To handle different output filetypes in Py3, injected lots of calls to new utils methods, as well as some six.PYX checks and logic. These changes are largely responsible for enhanced Py2/Py3 consistency. In converter: * To handle output filetypes in Py2, injected a few checks and fixes particularly around the py2 `str.encode` method and its assumed usual use-analogies in Py3.	2015-05-17 21:08:57 +01:00
cybjit	2639b15ef4	guess argv encoding in py2 using sys.stdin.encoding	2014-09-16 23:17:26 +02:00
cybjit	14585987c3	keep password api unicode, latin1 or utf-8 is encoded in handler	2014-09-16 22:58:25 +02:00
cybjit	714423883c	setup logging for pdf2txt and fix dumppdf	2014-09-12 00:29:31 +02:00
cybjit	ed13f7c47d	conv_cmap py3 compat	2014-09-12 00:29:30 +02:00
cybjit	0a2d90c051	pdf2txt: do not double encode stdout	2014-09-07 18:34:11 +02:00
unknown	28c2a4e6ad	2.7/3.4 encoding corrected	2014-09-04 10:31:33 +02:00
unknown	7b610b34be	tools must be a module to enable scripts tests	2014-09-04 09:47:33 +02:00
unknown	29c07ea770	Python 3.4 support and tests	2014-09-03 15:26:08 +02:00
unknown	a6475b61b4	Python 3.4 support added and tested	2014-09-03 13:17:41 +02:00
Yusuke Shinyama	fe86b4e64e	Changed: StringIO -> io.BytesIO	2014-06-25 19:55:41 +09:00
Yusuke Shinyama	44074b42ea	Added: stripcontrol for XMLConverter (-S option)	2014-06-22 00:33:00 +09:00
Yusuke Shinyama	bb866ae148	Changed: new except syntax (2.6 or above).	2014-06-16 18:50:07 +09:00
Yusuke Shinyama	28e96ba3d0	Use print as a function.	2014-06-15 12:14:33 +09:00
Yusuke Shinyama	1384a3fe8d	Code cleanup: removed some debug flags.	2014-06-14 15:43:10 +09:00
Yusuke Shinyama	17b9b19a26	Fixed for newer version: pdf2html.cgi	2014-04-02 18:54:50 +09:00
Yusuke Shinyama	340387bfc6	Cleanup: isinstance	2014-03-28 17:50:59 +09:00
Yusuke Shinyama	f9079e4c0a	Fixed dumppdf.py issues.	2014-03-24 20:55:00 +09:00
Yusuke Shinyama	bb6f9b6fc9	Added: -R option.	2013-11-25 18:21:19 +09:00
Alex Rothberg	af8c4a6b8f	- only visit each objid once when dumping all objects	2013-11-18 20:41:09 -05:00
Yusuke Shinyama	2b56b2eedf	Merged.	2013-11-07 19:50:41 +09:00
Matthew Duggan	c1da8b835c	PEP8: Remove trailing whitespace	2013-11-07 16:14:53 +09:00
Matthew Duggan	10a68c83bd	Remove unused imports identified by pyflakes	2013-11-07 16:09:44 +09:00
Yusuke Shinyama	d3730a29ec	API change: process_pdf -> PDFPage.get_pages	2013-10-22 18:59:16 +09:00
Yusuke Shinyama	8a70a9f657	fixed: encoding problem with vertical characters.	2013-10-22 18:44:40 +09:00
Yusuke Shinyama	32844507ea	Fixed some style issues.	2013-10-19 08:41:01 +09:00
Yusuke Shinyama	28cb424f8f	Merge pull request #21 from eug48/master dumppdf: support for extracting embedded files using the -E option	2013-10-18 16:23:09 -07:00
Yusuke Shinyama	6ca9ac5434	chmod fix.	2013-10-17 23:06:07 +09:00
Yusuke Shinyama	0ea08890d4	renamed: python2 -> python.	2013-10-17 23:05:27 +09:00
Yusuke Shinyama	6ad82e355c	Beating the codepage dragon.	2013-10-17 22:57:48 +09:00
Yusuke Shinyama	774827b4ce	Code cleanup: conv_cmap.py	2013-10-12 13:20:40 +09:00
Yusuke Shinyama	f85c374cae	Separated PDFPage to pdfpage.py.	2013-10-10 19:54:55 +09:00
Yusuke Shinyama	c926874d20	API Change: the PDFDocument cstr now takes PDFParser. set_parser() is removed.	2013-10-10 18:40:06 +09:00
Yusuke Shinyama	2221163b94	Split pdfparser.py and pdfdocument.py.	2013-10-10 18:29:30 +09:00

1 2 3

127 Commits (6d3210d206125fc537caa7189260d131b9daee6c)