Guglielmetti Philippe
6d3210d206
pdfdiff tool (and .spec files for compilation with pyinstaller)
2017-11-21 10:48:45 +01:00
Attila Szász
938419c476
Align dumppdf tool to modified data structures. ( #73 )
...
* Align dumppdf tool to modified data structures.
TOC page numbers should also work now, counting from 1.
* Update version number.
2017-07-20 20:46:11 +02:00
Hugh Secker-Walker
35a58ee5b5
Add tools/pdfstats.py which counts all LT* types in a PDF ( #68 )
2017-05-29 09:11:58 +02:00
Hugh Secker-Walker
488545ddc7
Add string expressions to asserts showing local data ( #67 )
2017-05-29 09:06:09 +02:00
Philippe Guglielmetti
52feb22eeb
Merge remote-tracking branch 'origin/master'
...
Conflicts:
MANIFEST.in
README.md
pdfminer/latin_enc.py
pdfminer/pdfdocument.py
pdfminer/pdfinterp.py
pdfminer/pdfpage.py
pdfminer/pdftypes.py
pdfminer/psparser.py
pdfminer/utils.py
samples/Makefile
setup.py
2017-01-19 08:03:16 +01:00
Antonio Ercole De Luca
0fdebc6739
Removing all the "#!/usr/bin/env python" lines, they do not need for … ( #34 )
...
* Removing all the "#!/usr/bin/env python" lines, they do not need for python3, solving issue number: #19 .
* Restored all the shebangs in the tools and tests folders (because they are real executables) but used "#!/usr/bin/env python" instead of "#!/usr/bin/python" as this blog points out: https://www.peterbe.com/plog/importance-of-env
Removed also the shebang from pdfminer/psparser.py file.
2016-11-08 20:01:11 +01:00
Jakub Wilk
5ddbecb551
Fix typos
2016-09-13 16:25:09 +02:00
Friedrich Lindenberg
1d54ecd31c
Make the logger run in a namespace.
2016-05-20 21:12:05 +02:00
Ivan Teoh
2c8f226907
Fix issues #20 - NameError: global name 'ImageWriter' is not defined
2016-04-26 12:38:42 +10:00
Chris Hager
2e1be5721f
removed settings.ENFORCE_CHECK_EXTRACTABLE
2015-11-01 22:34:18 +01:00
Chris Hager
b686dd0139
pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE
2015-11-01 22:28:08 +01:00
Cathal Garvey
268e9fb2bd
Removed typechecking, nothing's exploded yet and argparse does lots of heavy lifting already.
2015-05-30 17:05:28 +01:00
Cathal Garvey
b3553cef10
Cleaning up pdf2txt.py after the partition/move.
2015-05-30 17:03:55 +01:00
Cathal Garvey
cbe270a4bf
Killed the old main function for pdf2txt.py
2015-05-30 16:37:22 +01:00
Cathal Garvey
ead8e778a6
Successfully compartmentalised code, getting closer to moving pdf->text as a module function.
2015-05-30 16:27:58 +01:00
Cathal Garvey
08cb217983
Progress, progress.. not nearly atomic enough, sorry.
2015-05-30 16:14:24 +01:00
Cathal Garvey
1b47bed306
Many changes to make pdf2txt.py work better in Py3, some in that script, others in module!
...
Sorry, changes should have been more atomic.
*In pdf2txt.py:*
* Re-wrote main function to use argparse instead of optparse.
* Manually tested in Py2/Py3 to get partial consistency.
* Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway.
* Py2 mode *probably* unchanged, cannot find any bugs yet...
* Kept old main function for posterity, for now.
*In utils:*
* Added a few compatibility functions (some string hax required chardet, new dependency):
- make_compat_bytes(in_str)-> (py3->bytes | py2->str)
- make_compat_str(in_str)-> (str)
- compatible_encode_method(bytesorstring, encoding, erraction)-> (str)
*In pdfdevice:*
* To handle different output filetypes in Py3, injected lots of calls to new utils methods,
as well as some six.PYX checks and logic. These changes are largely responsible for
enhanced Py2/Py3 consistency.
*In converter:*
* To handle output filetypes in Py2, injected a few checks and fixes particularly around the
py2 `str.encode` method and its *assumed* usual use-analogies in Py3.
2015-05-17 21:08:57 +01:00
cybjit
2639b15ef4
guess argv encoding in py2 using sys.stdin.encoding
2014-09-16 23:17:26 +02:00
cybjit
14585987c3
keep password api unicode, latin1 or utf-8 is encoded in handler
2014-09-16 22:58:25 +02:00
cybjit
714423883c
setup logging for pdf2txt and fix dumppdf
2014-09-12 00:29:31 +02:00
cybjit
ed13f7c47d
conv_cmap py3 compat
2014-09-12 00:29:30 +02:00
cybjit
0a2d90c051
pdf2txt: do not double encode stdout
2014-09-07 18:34:11 +02:00
unknown
28c2a4e6ad
2.7/3.4 encoding corrected
2014-09-04 10:31:33 +02:00
unknown
7b610b34be
tools must be a module to enable scripts tests
2014-09-04 09:47:33 +02:00
unknown
29c07ea770
Python 3.4 support and tests
2014-09-03 15:26:08 +02:00
unknown
a6475b61b4
Python 3.4 support added and tested
2014-09-03 13:17:41 +02:00
Yusuke Shinyama
fe86b4e64e
Changed: StringIO -> io.BytesIO
2014-06-25 19:55:41 +09:00
Yusuke Shinyama
44074b42ea
Added: stripcontrol for XMLConverter (-S option)
2014-06-22 00:33:00 +09:00
Yusuke Shinyama
bb866ae148
Changed: new except syntax (2.6 or above).
2014-06-16 18:50:07 +09:00
Yusuke Shinyama
28e96ba3d0
Use print as a function.
2014-06-15 12:14:33 +09:00
Yusuke Shinyama
1384a3fe8d
Code cleanup: removed some debug flags.
2014-06-14 15:43:10 +09:00
Yusuke Shinyama
17b9b19a26
Fixed for newer version: pdf2html.cgi
2014-04-02 18:54:50 +09:00
Yusuke Shinyama
340387bfc6
Cleanup: isinstance
2014-03-28 17:50:59 +09:00
Yusuke Shinyama
f9079e4c0a
Fixed dumppdf.py issues.
2014-03-24 20:55:00 +09:00
Yusuke Shinyama
bb6f9b6fc9
Added: -R option.
2013-11-25 18:21:19 +09:00
Alex Rothberg
af8c4a6b8f
- only visit each objid once when dumping all objects
2013-11-18 20:41:09 -05:00
Yusuke Shinyama
2b56b2eedf
Merged.
2013-11-07 19:50:41 +09:00
Matthew Duggan
c1da8b835c
PEP8: Remove trailing whitespace
2013-11-07 16:14:53 +09:00
Matthew Duggan
10a68c83bd
Remove unused imports identified by pyflakes
2013-11-07 16:09:44 +09:00
Yusuke Shinyama
d3730a29ec
API change: process_pdf -> PDFPage.get_pages
2013-10-22 18:59:16 +09:00
Yusuke Shinyama
8a70a9f657
fixed: encoding problem with vertical characters.
2013-10-22 18:44:40 +09:00
Yusuke Shinyama
32844507ea
Fixed some style issues.
2013-10-19 08:41:01 +09:00
Yusuke Shinyama
28cb424f8f
Merge pull request #21 from eug48/master
...
dumppdf: support for extracting embedded files using the -E option
2013-10-18 16:23:09 -07:00
Yusuke Shinyama
6ca9ac5434
chmod fix.
2013-10-17 23:06:07 +09:00
Yusuke Shinyama
0ea08890d4
renamed: python2 -> python.
2013-10-17 23:05:27 +09:00
Yusuke Shinyama
6ad82e355c
Beating the codepage dragon.
2013-10-17 22:57:48 +09:00
Yusuke Shinyama
774827b4ce
Code cleanup: conv_cmap.py
2013-10-12 13:20:40 +09:00
Yusuke Shinyama
f85c374cae
Separated PDFPage to pdfpage.py.
2013-10-10 19:54:55 +09:00
Yusuke Shinyama
c926874d20
API Change: the PDFDocument cstr now takes PDFParser. set_parser() is removed.
2013-10-10 18:40:06 +09:00
Yusuke Shinyama
2221163b94
Split pdfparser.py and pdfdocument.py.
2013-10-10 18:29:30 +09:00