Chris Hager
2e1be5721f
removed settings.ENFORCE_CHECK_EXTRACTABLE
2015-11-01 22:34:18 +01:00
Chris Hager
b686dd0139
pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE
2015-11-01 22:28:08 +01:00
Cathal Garvey
268e9fb2bd
Removed typechecking, nothing's exploded yet and argparse does lots of heavy lifting already.
2015-05-30 17:05:28 +01:00
Cathal Garvey
b3553cef10
Cleaning up pdf2txt.py after the partition/move.
2015-05-30 17:03:55 +01:00
Cathal Garvey
cbe270a4bf
Killed the old main function for pdf2txt.py
2015-05-30 16:37:22 +01:00
Cathal Garvey
ead8e778a6
Successfully compartmentalised code, getting closer to moving pdf->text as a module function.
2015-05-30 16:27:58 +01:00
Cathal Garvey
08cb217983
Progress, progress.. not nearly atomic enough, sorry.
2015-05-30 16:14:24 +01:00
Cathal Garvey
1b47bed306
Many changes to make pdf2txt.py work better in Py3, some in that script, others in module!
...
Sorry, changes should have been more atomic.
*In pdf2txt.py:*
* Re-wrote main function to use argparse instead of optparse.
* Manually tested in Py2/Py3 to get partial consistency.
* Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway.
* Py2 mode *probably* unchanged, cannot find any bugs yet...
* Kept old main function for posterity, for now.
*In utils:*
* Added a few compatibility functions (some string hax required chardet, new dependency):
- make_compat_bytes(in_str)-> (py3->bytes | py2->str)
- make_compat_str(in_str)-> (str)
- compatible_encode_method(bytesorstring, encoding, erraction)-> (str)
*In pdfdevice:*
* To handle different output filetypes in Py3, injected lots of calls to new utils methods,
as well as some six.PYX checks and logic. These changes are largely responsible for
enhanced Py2/Py3 consistency.
*In converter:*
* To handle output filetypes in Py2, injected a few checks and fixes particularly around the
py2 `str.encode` method and its *assumed* usual use-analogies in Py3.
2015-05-17 21:08:57 +01:00
cybjit
2639b15ef4
guess argv encoding in py2 using sys.stdin.encoding
2014-09-16 23:17:26 +02:00
cybjit
14585987c3
keep password api unicode, latin1 or utf-8 is encoded in handler
2014-09-16 22:58:25 +02:00
cybjit
714423883c
setup logging for pdf2txt and fix dumppdf
2014-09-12 00:29:31 +02:00
cybjit
ed13f7c47d
conv_cmap py3 compat
2014-09-12 00:29:30 +02:00
cybjit
0a2d90c051
pdf2txt: do not double encode stdout
2014-09-07 18:34:11 +02:00
unknown
28c2a4e6ad
2.7/3.4 encoding corrected
2014-09-04 10:31:33 +02:00
unknown
7b610b34be
tools must be a module to enable scripts tests
2014-09-04 09:47:33 +02:00
unknown
29c07ea770
Python 3.4 support and tests
2014-09-03 15:26:08 +02:00
unknown
a6475b61b4
Python 3.4 support added and tested
2014-09-03 13:17:41 +02:00
Yusuke Shinyama
fe86b4e64e
Changed: StringIO -> io.BytesIO
2014-06-25 19:55:41 +09:00
Yusuke Shinyama
44074b42ea
Added: stripcontrol for XMLConverter (-S option)
2014-06-22 00:33:00 +09:00
Yusuke Shinyama
bb866ae148
Changed: new except syntax (2.6 or above).
2014-06-16 18:50:07 +09:00
Yusuke Shinyama
28e96ba3d0
Use print as a function.
2014-06-15 12:14:33 +09:00
Yusuke Shinyama
1384a3fe8d
Code cleanup: removed some debug flags.
2014-06-14 15:43:10 +09:00
Yusuke Shinyama
17b9b19a26
Fixed for newer version: pdf2html.cgi
2014-04-02 18:54:50 +09:00
Yusuke Shinyama
340387bfc6
Cleanup: isinstance
2014-03-28 17:50:59 +09:00
Yusuke Shinyama
f9079e4c0a
Fixed dumppdf.py issues.
2014-03-24 20:55:00 +09:00
Yusuke Shinyama
bb6f9b6fc9
Added: -R option.
2013-11-25 18:21:19 +09:00
Alex Rothberg
af8c4a6b8f
- only visit each objid once when dumping all objects
2013-11-18 20:41:09 -05:00
Yusuke Shinyama
2b56b2eedf
Merged.
2013-11-07 19:50:41 +09:00
Matthew Duggan
c1da8b835c
PEP8: Remove trailing whitespace
2013-11-07 16:14:53 +09:00
Matthew Duggan
10a68c83bd
Remove unused imports identified by pyflakes
2013-11-07 16:09:44 +09:00
Yusuke Shinyama
d3730a29ec
API change: process_pdf -> PDFPage.get_pages
2013-10-22 18:59:16 +09:00
Yusuke Shinyama
8a70a9f657
fixed: encoding problem with vertical characters.
2013-10-22 18:44:40 +09:00
Yusuke Shinyama
32844507ea
Fixed some style issues.
2013-10-19 08:41:01 +09:00
Yusuke Shinyama
28cb424f8f
Merge pull request #21 from eug48/master
...
dumppdf: support for extracting embedded files using the -E option
2013-10-18 16:23:09 -07:00
Yusuke Shinyama
6ca9ac5434
chmod fix.
2013-10-17 23:06:07 +09:00
Yusuke Shinyama
0ea08890d4
renamed: python2 -> python.
2013-10-17 23:05:27 +09:00
Yusuke Shinyama
6ad82e355c
Beating the codepage dragon.
2013-10-17 22:57:48 +09:00
Yusuke Shinyama
774827b4ce
Code cleanup: conv_cmap.py
2013-10-12 13:20:40 +09:00
Yusuke Shinyama
f85c374cae
Separated PDFPage to pdfpage.py.
2013-10-10 19:54:55 +09:00
Yusuke Shinyama
c926874d20
API Change: the PDFDocument cstr now takes PDFParser. set_parser() is removed.
2013-10-10 18:40:06 +09:00
Yusuke Shinyama
2221163b94
Split pdfparser.py and pdfdocument.py.
2013-10-10 18:29:30 +09:00
Yusuke Shinyama
1467fc674c
Added fallback for broken PDFs.
2013-10-09 22:45:54 +09:00
Yusuke Shinyama
06425bba00
Introducing PDFObjectNotFound
2013-10-09 21:39:23 +09:00
eug
925845b172
dumppdf: support for extracting embedded files using the -E option
2013-01-20 13:29:35 +10:00
Yusuke Shinyama
82ff98c7b3
imagewriter now works with text output
2011-11-07 01:15:10 +10:00
Yusuke Shinyama
dc8fde0e47
added CCITTFaxFilter support and a very crude image extraction.
2011-07-18 21:07:00 +10:00
Yusuke Shinyama
fcf0d74ecc
tweaks for debugging
2011-04-21 22:07:52 +09:00
Yusuke Shinyama
4918d59bc2
disable caching support
2011-03-03 00:04:43 +09:00
Yusuke Shinyama
7dbb664db3
code cleanup and more debugging options
2011-02-14 23:42:05 +09:00
Yusuke Shinyama
cbd58121e3
fix aggressive vertical writing detection (which ruins layout)
2011-02-02 23:09:34 +09:00