Cathal Garvey
1b47bed306
Many changes to make pdf2txt.py work better in Py3, some in that script, others in module!
...
Sorry, changes should have been more atomic.
*In pdf2txt.py:*
* Re-wrote main function to use argparse instead of optparse.
* Manually tested in Py2/Py3 to get partial consistency.
* Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway.
* Py2 mode *probably* unchanged, cannot find any bugs yet...
* Kept old main function for posterity, for now.
*In utils:*
* Added a few compatibility functions (some string hax required chardet, new dependency):
- make_compat_bytes(in_str)-> (py3->bytes | py2->str)
- make_compat_str(in_str)-> (str)
- compatible_encode_method(bytesorstring, encoding, erraction)-> (str)
*In pdfdevice:*
* To handle different output filetypes in Py3, injected lots of calls to new utils methods,
as well as some six.PYX checks and logic. These changes are largely responsible for
enhanced Py2/Py3 consistency.
*In converter:*
* To handle output filetypes in Py2, injected a few checks and fixes particularly around the
py2 `str.encode` method and its *assumed* usual use-analogies in Py3.
2015-05-17 21:08:57 +01:00
enkore
d0379a2c44
Fix utils.decode_text
2014-12-04 17:09:52 +01:00
cybjit
9b2e29396b
apply_png_predictor py3
2014-09-16 22:59:29 +02:00
cybjit
51a361c145
clean up HTMLConverter and XMLConverter encoding
2014-09-16 22:57:00 +02:00
unknown
29c07ea770
Python 3.4 support and tests
2014-09-03 15:26:08 +02:00
unknown
a6475b61b4
Python 3.4 support added and tested
2014-09-03 13:17:41 +02:00
unknown
faea7291a8
tests pass under Py 2.7 and 3.4
2014-09-01 14:16:49 +02:00
Yusuke Shinyama
1ccfaff411
String-Bytes distinction (first attempt).
2014-06-30 19:05:56 +09:00
Yusuke Shinyama
2e900e5d10
Fixed for consistent test results. (hopefully...)
2014-06-26 17:41:31 +09:00
Yusuke Shinyama
0387a6c260
Removed: tuple-unpacking args.
2014-06-15 12:12:13 +09:00
Yusuke Shinyama
d9680fca7e
Plane: preserve the object order so that the test result is always consistent.
2014-06-14 14:44:53 +09:00
Yusuke Shinyama
340387bfc6
Cleanup: isinstance
2014-03-28 17:50:59 +09:00
Yusuke Shinyama
636d4caeb3
Fixed the PNG predictor bug. Thanks to Gabor Molnar.
2014-03-24 19:57:05 +09:00
Yusuke Shinyama
c97ec3048e
Changed / to // for clarity.
2013-11-26 21:35:16 +09:00
Yusuke Shinyama
c8b6d4112a
Fixed: crash with negative layout bbox.
2013-11-09 15:10:14 +09:00
Matthew Duggan
2caa5edc25
PEP8: Whitespace changes to match pep8
2013-11-07 17:35:04 +09:00
Matthew Duggan
c1da8b835c
PEP8: Remove trailing whitespace
2013-11-07 16:14:53 +09:00
Yusuke Shinyama
e927bd307e
fixed: https://github.com/euske/pdfminer/issues/8
2013-10-22 18:24:39 +09:00
Yusuke Shinyama
0ea08890d4
renamed: python2 -> python.
2013-10-17 23:05:27 +09:00
Yusuke Shinyama
557c2c72e6
Removed ObjIdRange for terseness.
2013-10-10 18:34:43 +09:00
jcushman
da3f023b2d
Use set instead of list for Plane's internal collection of objects.
2012-06-22 16:36:33 -03:00
Yusuke Shinyama
46bb0107aa
fixed: crash due to small layout elements (thanks to hsoft)
2011-07-31 17:44:09 +10:00
Yusuke Shinyama
dc8fde0e47
added CCITTFaxFilter support and a very crude image extraction.
2011-07-18 21:07:00 +10:00
Yusuke Shinyama
0278076ea8
PNG predictor added
2011-06-07 00:46:33 +09:00
Yusuke Shinyama
18a5058af6
separated predictor functions.
2011-06-07 00:31:03 +09:00
Yusuke Shinyama
c134596e2f
code cleanup and testcase stabilization
2011-05-15 01:22:19 +09:00
Yusuke Shinyama
b8d516fc52
extended Plane class.
2011-05-14 14:16:40 +09:00
Yusuke Shinyama
8f9684f6a6
code cleanup: layout analysis
2011-04-21 22:07:04 +09:00
Yusuke Shinyama
18e782f330
canonicalize package names
2011-03-02 23:43:03 +09:00
Yusuke Shinyama
bb26cf9180
eliminate empty textboxes
2011-03-01 20:47:20 +09:00
Yusuke Shinyama
a8bf9b159e
docstring fix
2011-02-27 13:09:12 +09:00
Yusuke Shinyama
cabaa10e4f
layout analysis improvement
2011-02-27 12:56:28 +09:00
Yusuke Shinyama
b2d13db29a
code cleanup
2011-02-14 22:51:20 +09:00
yusuke.shinyama.dummy
9584845358
layout analysis improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@268 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:05 +00:00
yusuke.shinyama.dummy
1904b61355
documentation
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@266 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:40 +00:00
yusuke.shinyama.dummy
509ab66319
stay with python2
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy
69d9d85685
nunpack TypeError fix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@246 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:52 +00:00
yusuke.shinyama.dummy
3305c07ba2
layout analysis improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@245 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:39 +00:00
yusuke.shinyama.dummy
c81142aa44
image handling addition (untested)
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@202 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:05:02 +00:00
yusuke.shinyama.dummy
23be96c49e
CAUTION! changed the way of internal layout handling.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@184 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-27 03:59:25 +00:00
yusuke.shinyama.dummy
0f8fe3f19e
Page rotation bug fixed.
...
Various minor fixes.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@176 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-31 02:09:28 +00:00
yusuke.shinyama.dummy
98c8367339
warning removal.
...
code cleanup.
cmap bug fixed.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@168 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-01 03:09:26 +00:00
yusuke.shinyama.dummy
0298e26acc
speed-tweak.diff from Yannick Gingras
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@158 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-14 11:29:40 +00:00
yusuke.shinyama.dummy
f444c88e3d
testing against None with "is", not using "=="
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@153 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-11-06 15:10:29 +00:00
yusuke.shinyama.dummy
7790808560
to 4-space indentation
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-24 04:41:59 +00:00
yusuke.shinyama.dummy
2ed6b90551
text rotation handling fixed.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@137 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-10-03 04:36:54 +00:00
yusuke.shinyama.dummy
8a5bec5065
layout analysis improved.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@120 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-07-21 07:55:19 +00:00
yusuke.shinyama.dummy
af63784305
release-20090711
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@118 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-07-11 15:28:12 +00:00
yusuke.shinyama.dummy
3e12268bf6
rename package pdflib -> pdfminer.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@103 1aa58f4a-7d42-0410-adbc-911cccaed67c
2009-05-16 06:12:01 +00:00
yusuke.shinyama.dummy
8a77664c6b
changed again...
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@36 1aa58f4a-7d42-0410-adbc-911cccaed67c
2008-06-29 08:49:28 +00:00