pdfminer.six

Commit Graph

Author	SHA1	Message	Date
Wm Bentley	495c92e050	Move argparse object setup out of main to separate function. As preparation for implementing Sphinx documentation, create a separate function that builds and returns the argparse parser. Move import argparse out of main to the top of the file.	2018-08-12 21:07:52 -07:00
Andy Kluger	ed7d8308d9	-P is not for page numbers, but passwords, so reflect that in the help text	2018-04-03 12:26:01 -04:00
Antonio Ercole De Luca	0fdebc6739	Removing all the "#!/usr/bin/env python" lines, they do not need for … (#34 ) * Removing all the "#!/usr/bin/env python" lines, they do not need for python3, solving issue number: #19. * Restored all the shebangs in the tools and tests folders (because they are real executables) but used "#!/usr/bin/env python" instead of "#!/usr/bin/python" as this blog points out: https://www.peterbe.com/plog/importance-of-env Removed also the shebang from pdfminer/psparser.py file.	2016-11-08 20:01:11 +01:00
Ivan Teoh	2c8f226907	Fix issues #20 - NameError: global name 'ImageWriter' is not defined	2016-04-26 12:38:42 +10:00
Chris Hager	2e1be5721f	removed settings.ENFORCE_CHECK_EXTRACTABLE	2015-11-01 22:34:18 +01:00
Chris Hager	b686dd0139	pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE	2015-11-01 22:28:08 +01:00
Cathal Garvey	268e9fb2bd	Removed typechecking, nothing's exploded yet and argparse does lots of heavy lifting already.	2015-05-30 17:05:28 +01:00
Cathal Garvey	b3553cef10	Cleaning up pdf2txt.py after the partition/move.	2015-05-30 17:03:55 +01:00
Cathal Garvey	cbe270a4bf	Killed the old main function for pdf2txt.py	2015-05-30 16:37:22 +01:00
Cathal Garvey	ead8e778a6	Successfully compartmentalised code, getting closer to moving pdf->text as a module function.	2015-05-30 16:27:58 +01:00
Cathal Garvey	08cb217983	Progress, progress.. not nearly atomic enough, sorry.	2015-05-30 16:14:24 +01:00
Cathal Garvey	1b47bed306	Many changes to make pdf2txt.py work better in Py3, some in that script, others in module! Sorry, changes should have been more atomic. In pdf2txt.py: * Re-wrote main function to use argparse instead of optparse. * Manually tested in Py2/Py3 to get partial consistency. * Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway. * Py2 mode probably unchanged, cannot find any bugs yet... * Kept old main function for posterity, for now. In utils: * Added a few compatibility functions (some string hax required chardet, new dependency): - make_compat_bytes(in_str)-> (py3->bytes \| py2->str) - make_compat_str(in_str)-> (str) - compatible_encode_method(bytesorstring, encoding, erraction)-> (str) In pdfdevice: * To handle different output filetypes in Py3, injected lots of calls to new utils methods, as well as some six.PYX checks and logic. These changes are largely responsible for enhanced Py2/Py3 consistency. In converter: * To handle output filetypes in Py2, injected a few checks and fixes particularly around the py2 `str.encode` method and its assumed usual use-analogies in Py3.	2015-05-17 21:08:57 +01:00
cybjit	2639b15ef4	guess argv encoding in py2 using sys.stdin.encoding	2014-09-16 23:17:26 +02:00
cybjit	14585987c3	keep password api unicode, latin1 or utf-8 is encoded in handler	2014-09-16 22:58:25 +02:00
cybjit	714423883c	setup logging for pdf2txt and fix dumppdf	2014-09-12 00:29:31 +02:00
cybjit	0a2d90c051	pdf2txt: do not double encode stdout	2014-09-07 18:34:11 +02:00
unknown	29c07ea770	Python 3.4 support and tests	2014-09-03 15:26:08 +02:00
Yusuke Shinyama	44074b42ea	Added: stripcontrol for XMLConverter (-S option)	2014-06-22 00:33:00 +09:00
Yusuke Shinyama	1384a3fe8d	Code cleanup: removed some debug flags.	2014-06-14 15:43:10 +09:00
Yusuke Shinyama	bb6f9b6fc9	Added: -R option.	2013-11-25 18:21:19 +09:00
Yusuke Shinyama	d3730a29ec	API change: process_pdf -> PDFPage.get_pages	2013-10-22 18:59:16 +09:00
Yusuke Shinyama	0ea08890d4	renamed: python2 -> python.	2013-10-17 23:05:27 +09:00
Yusuke Shinyama	2221163b94	Split pdfparser.py and pdfdocument.py.	2013-10-10 18:29:30 +09:00
Yusuke Shinyama	82ff98c7b3	imagewriter now works with text output	2011-11-07 01:15:10 +10:00
Yusuke Shinyama	dc8fde0e47	added CCITTFaxFilter support and a very crude image extraction.	2011-07-18 21:07:00 +10:00
Yusuke Shinyama	fcf0d74ecc	tweaks for debugging	2011-04-21 22:07:52 +09:00
Yusuke Shinyama	4918d59bc2	disable caching support	2011-03-03 00:04:43 +09:00
Yusuke Shinyama	7dbb664db3	code cleanup and more debugging options	2011-02-14 23:42:05 +09:00
Yusuke Shinyama	cbd58121e3	fix aggressive vertical writing detection (which ruins layout)	2011-02-02 23:09:34 +09:00
Yusuke Shinyama	d3bcc0eef5	another minor fix	2010-12-26 19:30:46 +09:00
Yusuke Shinyama	a24c452ba2	boxes_flow patch by Daniel Gerber	2010-12-26 17:26:39 +09:00
yusuke.shinyama.dummy	2bf9c23801	check_extractable paramater added git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@276 1aa58f4a-7d42-0410-adbc-911cccaed67c	2010-11-23 10:53:28 +00:00
yusuke.shinyama.dummy	7374b81383	htmlconverter improved git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@274 1aa58f4a-7d42-0410-adbc-911cccaed67c	2010-11-14 15:04:28 +00:00
yusuke.shinyama.dummy	509ab66319	stay with python2 git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c	2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy	eb535d4106	change PDFPageAggregator -> PDFLayoutAnalyzer git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@213 1aa58f4a-7d42-0410-adbc-911cccaed67c	2010-04-24 13:31:21 +00:00
yusuke.shinyama.dummy	97848409e5	fix xobject resources bug, thanks to Jose Maria git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@209 1aa58f4a-7d42-0410-adbc-911cccaed67c	2010-04-24 04:32:03 +00:00
yusuke.shinyama.dummy	e77a6ba997	-A (all_texts) option added for layout analysis git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@205 1aa58f4a-7d42-0410-adbc-911cccaed67c	2010-04-10 11:30:03 +00:00
yusuke.shinyama.dummy	2e5b92c18a	writing mode detection git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@196 1aa58f4a-7d42-0410-adbc-911cccaed67c	2010-03-25 11:38:47 +00:00
yusuke.shinyama.dummy	ee34d8d549	bugfix (thanks to Brian Berry). Remaining TODOs: automatic testing for vertical texts. Various layout analysis tuning. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@193 1aa58f4a-7d42-0410-adbc-911cccaed67c	2010-03-22 08:36:39 +00:00
yusuke.shinyama.dummy	0f8fe3f19e	Page rotation bug fixed. Various minor fixes. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@176 1aa58f4a-7d42-0410-adbc-911cccaed67c	2010-01-31 02:09:28 +00:00
yusuke.shinyama.dummy	dc6e5c366d	jpeg extraction support added. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@174 1aa58f4a-7d42-0410-adbc-911cccaed67c	2010-01-30 07:30:01 +00:00
yusuke.shinyama.dummy	e4b089e327	include cmap git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@162 1aa58f4a-7d42-0410-adbc-911cccaed67c	2009-12-19 14:17:00 +00:00
yusuke.shinyama.dummy	ed8a5362b9	renamed cmap.py -> cmapdb.py (avoiding future name changes) git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@161 1aa58f4a-7d42-0410-adbc-911cccaed67c	2009-12-19 06:52:02 +00:00
yusuke.shinyama.dummy	61d4872c3a	add -n option to pdf2txt.py git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@157 1aa58f4a-7d42-0410-adbc-911cccaed67c	2009-11-07 09:12:54 +00:00
yusuke.shinyama.dummy	77986b8273	fix CMapDB initialization stuff. more code cleanup. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@148 1aa58f4a-7d42-0410-adbc-911cccaed67c	2009-11-03 13:39:34 +00:00
yusuke.shinyama.dummy	78f7866554	sgml to xml git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@146 1aa58f4a-7d42-0410-adbc-911cccaed67c	2009-10-31 03:04:56 +00:00
yusuke.shinyama.dummy	23b8058ad4	outfp closing bug fixed git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@145 1aa58f4a-7d42-0410-adbc-911cccaed67c	2009-10-31 02:09:36 +00:00
yusuke.shinyama.dummy	7790808560	to 4-space indentation git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c	2009-10-24 04:41:59 +00:00
yusuke.shinyama.dummy	8a5bec5065	layout analysis improved. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@120 1aa58f4a-7d42-0410-adbc-911cccaed67c	2009-07-21 07:55:19 +00:00
yusuke.shinyama.dummy	787ae4f814	documentation fix git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@117 1aa58f4a-7d42-0410-adbc-911cccaed67c	2009-07-11 12:42:12 +00:00

1 2

66 Commits (2f4518231f0b2f30c14a598948d82b1f24839114)