pdfminer.six

Commit Graph

Author	SHA1	Message	Date
Chris Hager	2e1be5721f	removed settings.ENFORCE_CHECK_EXTRACTABLE	2015-11-01 22:34:18 +01:00
Chris Hager	b686dd0139	pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE	2015-11-01 22:28:08 +01:00
Cathal Garvey	268e9fb2bd	Removed typechecking, nothing's exploded yet and argparse does lots of heavy lifting already.	2015-05-30 17:05:28 +01:00
Cathal Garvey	b3553cef10	Cleaning up pdf2txt.py after the partition/move.	2015-05-30 17:03:55 +01:00
Cathal Garvey	cbe270a4bf	Killed the old main function for pdf2txt.py	2015-05-30 16:37:22 +01:00
Cathal Garvey	ead8e778a6	Successfully compartmentalised code, getting closer to moving pdf->text as a module function.	2015-05-30 16:27:58 +01:00
Cathal Garvey	08cb217983	Progress, progress.. not nearly atomic enough, sorry.	2015-05-30 16:14:24 +01:00
Cathal Garvey	1b47bed306	Many changes to make pdf2txt.py work better in Py3, some in that script, others in module! Sorry, changes should have been more atomic. In pdf2txt.py: * Re-wrote main function to use argparse instead of optparse. * Manually tested in Py2/Py3 to get partial consistency. * Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway. * Py2 mode probably unchanged, cannot find any bugs yet... * Kept old main function for posterity, for now. In utils: * Added a few compatibility functions (some string hax required chardet, new dependency): - make_compat_bytes(in_str)-> (py3->bytes \| py2->str) - make_compat_str(in_str)-> (str) - compatible_encode_method(bytesorstring, encoding, erraction)-> (str) In pdfdevice: * To handle different output filetypes in Py3, injected lots of calls to new utils methods, as well as some six.PYX checks and logic. These changes are largely responsible for enhanced Py2/Py3 consistency. In converter: * To handle output filetypes in Py2, injected a few checks and fixes particularly around the py2 `str.encode` method and its assumed usual use-analogies in Py3.	2015-05-17 21:08:57 +01:00
cybjit	2639b15ef4	guess argv encoding in py2 using sys.stdin.encoding	2014-09-16 23:17:26 +02:00
cybjit	14585987c3	keep password api unicode, latin1 or utf-8 is encoded in handler	2014-09-16 22:58:25 +02:00
cybjit	714423883c	setup logging for pdf2txt and fix dumppdf	2014-09-12 00:29:31 +02:00
cybjit	ed13f7c47d	conv_cmap py3 compat	2014-09-12 00:29:30 +02:00
cybjit	0a2d90c051	pdf2txt: do not double encode stdout	2014-09-07 18:34:11 +02:00
unknown	28c2a4e6ad	2.7/3.4 encoding corrected	2014-09-04 10:31:33 +02:00
unknown	7b610b34be	tools must be a module to enable scripts tests	2014-09-04 09:47:33 +02:00
unknown	29c07ea770	Python 3.4 support and tests	2014-09-03 15:26:08 +02:00
unknown	a6475b61b4	Python 3.4 support added and tested	2014-09-03 13:17:41 +02:00
Yusuke Shinyama	fe86b4e64e	Changed: StringIO -> io.BytesIO	2014-06-25 19:55:41 +09:00
Yusuke Shinyama	44074b42ea	Added: stripcontrol for XMLConverter (-S option)	2014-06-22 00:33:00 +09:00
Yusuke Shinyama	bb866ae148	Changed: new except syntax (2.6 or above).	2014-06-16 18:50:07 +09:00
Yusuke Shinyama	28e96ba3d0	Use print as a function.	2014-06-15 12:14:33 +09:00
Yusuke Shinyama	1384a3fe8d	Code cleanup: removed some debug flags.	2014-06-14 15:43:10 +09:00
Yusuke Shinyama	17b9b19a26	Fixed for newer version: pdf2html.cgi	2014-04-02 18:54:50 +09:00
Yusuke Shinyama	340387bfc6	Cleanup: isinstance	2014-03-28 17:50:59 +09:00
Yusuke Shinyama	f9079e4c0a	Fixed dumppdf.py issues.	2014-03-24 20:55:00 +09:00
Yusuke Shinyama	bb6f9b6fc9	Added: -R option.	2013-11-25 18:21:19 +09:00
Alex Rothberg	af8c4a6b8f	- only visit each objid once when dumping all objects	2013-11-18 20:41:09 -05:00
Yusuke Shinyama	2b56b2eedf	Merged.	2013-11-07 19:50:41 +09:00
Matthew Duggan	c1da8b835c	PEP8: Remove trailing whitespace	2013-11-07 16:14:53 +09:00
Matthew Duggan	10a68c83bd	Remove unused imports identified by pyflakes	2013-11-07 16:09:44 +09:00
Yusuke Shinyama	d3730a29ec	API change: process_pdf -> PDFPage.get_pages	2013-10-22 18:59:16 +09:00
Yusuke Shinyama	8a70a9f657	fixed: encoding problem with vertical characters.	2013-10-22 18:44:40 +09:00
Yusuke Shinyama	32844507ea	Fixed some style issues.	2013-10-19 08:41:01 +09:00
Yusuke Shinyama	28cb424f8f	Merge pull request #21 from eug48/master dumppdf: support for extracting embedded files using the -E option	2013-10-18 16:23:09 -07:00
Yusuke Shinyama	6ca9ac5434	chmod fix.	2013-10-17 23:06:07 +09:00
Yusuke Shinyama	0ea08890d4	renamed: python2 -> python.	2013-10-17 23:05:27 +09:00
Yusuke Shinyama	6ad82e355c	Beating the codepage dragon.	2013-10-17 22:57:48 +09:00
Yusuke Shinyama	774827b4ce	Code cleanup: conv_cmap.py	2013-10-12 13:20:40 +09:00
Yusuke Shinyama	f85c374cae	Separated PDFPage to pdfpage.py.	2013-10-10 19:54:55 +09:00
Yusuke Shinyama	c926874d20	API Change: the PDFDocument cstr now takes PDFParser. set_parser() is removed.	2013-10-10 18:40:06 +09:00
Yusuke Shinyama	2221163b94	Split pdfparser.py and pdfdocument.py.	2013-10-10 18:29:30 +09:00
Yusuke Shinyama	1467fc674c	Added fallback for broken PDFs.	2013-10-09 22:45:54 +09:00
Yusuke Shinyama	06425bba00	Introducing PDFObjectNotFound	2013-10-09 21:39:23 +09:00
eug	925845b172	dumppdf: support for extracting embedded files using the -E option	2013-01-20 13:29:35 +10:00
Yusuke Shinyama	82ff98c7b3	imagewriter now works with text output	2011-11-07 01:15:10 +10:00
Yusuke Shinyama	dc8fde0e47	added CCITTFaxFilter support and a very crude image extraction.	2011-07-18 21:07:00 +10:00
Yusuke Shinyama	fcf0d74ecc	tweaks for debugging	2011-04-21 22:07:52 +09:00
Yusuke Shinyama	4918d59bc2	disable caching support	2011-03-03 00:04:43 +09:00
Yusuke Shinyama	7dbb664db3	code cleanup and more debugging options	2011-02-14 23:42:05 +09:00
Yusuke Shinyama	cbd58121e3	fix aggressive vertical writing detection (which ruins layout)	2011-02-02 23:09:34 +09:00

1 2 3

118 Commits (f1d5d681b6d2ab0ddeaea925ba784ebb94f6d509)