Commit Graph

64 Commits (40aa2533c98fb9c6b700891356e638bd6821ad13)

Author SHA1 Message Date
Pieter Marsman 9fd7172f7b Cleanup utils.py 2019-10-17 12:14:02 +02:00
jet457 7e40fde320 Removing assertion in drange to allow equal inputs (#246) and mimic behaviour of built-in method range
Fixes #66, since it now allows the bbox to have 0 width or 0 height
Added tests for Plane since it is the API that uses drange
2019-10-17 12:04:25 +02:00
Tata Ganesh e03ecab856
Merge pull request #141 from timb07/speedup_layout
Speed up layout of text boxes
2018-11-08 20:28:40 +05:30
Tim Bell 8f8a78bb88 Remove now-unused csort() 2018-04-11 09:37:32 +10:00
Gregory Mori 335c25c045 only check for bytes input to enc() in python3
In python2, isinstance("", bytes) is true, causing enc() to
suppress any string input. This results in fontnames being lost
when running pdf2txt.py in python2.

As this check was not present in the original python2 version of
pdfminer, restrict it to only check when running in python3.
2018-04-09 12:21:59 -07:00
KOLANICH 3bf3c97bbb
Added a vector between 2 boxes which may be useful for users of the library 2018-02-16 14:49:12 +00:00
Hugh Secker-Walker 488545ddc7 Add string expressions to asserts showing local data (#67) 2017-05-29 09:06:09 +02:00
Andrew Baumann 9439a3a31a Miscellaneous bug fixes (#47)
* utils.decode_text: fix "TypeError: ord() expected string of length 1, but int found"

fixes https://github.com/goulu/pdfminer/issues/24

* pdfinterp.execute: don't assume that every keyword name can be decoded as utf-8

fixes "'str' does not support the buffer interface", https://github.com/goulu/pdfminer/issues/23

* default settings.STRICT to False, for compatibility with the original pdfminer

* PDFCIDFont: handle font registry/orderings that may be PDFObjRefs

* utils.nunpack: handle 8-byte integers
2017-02-06 14:57:01 +01:00
Jin-tae Hwang 61d423d81c bugfix: if fontname is bytes then skip (#43) 2016-12-14 17:34:16 +01:00
Antonio Ercole De Luca 0fdebc6739 Removing all the "#!/usr/bin/env python" lines, they do not need for … (#34)
* Removing all the "#!/usr/bin/env python" lines, they do not need for python3, solving issue number: #19.

* Restored all the shebangs in the tools and tests folders (because they are real executables) but used "#!/usr/bin/env python" instead of "#!/usr/bin/python" as this blog points out: https://www.peterbe.com/plog/importance-of-env
Removed also the shebang from pdfminer/psparser.py file.
2016-11-08 20:01:11 +01:00
Friedrich Lindenberg 1820f96481 backport changes for upstream: #145, #95, #111, #117, #129, #132. 2016-09-23 14:31:31 +02:00
Cathal Garvey 403711ed13 Whoops, forgot to version-gate chardet in the actual code. Thanks Travis! 2015-05-30 19:33:35 +01:00
Cathal Garvey a2ad7a6d03 Fixed some bugs preventing all tests from passing in Py2. 2015-05-30 18:02:29 +01:00
Cathal Garvey 1b47bed306 Many changes to make pdf2txt.py work better in Py3, some in that script, others in module!
Sorry, changes should have been more atomic.

*In pdf2txt.py:*

* Re-wrote main function to use argparse instead of optparse.
* Manually tested in Py2/Py3 to get partial consistency.
* Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway.
* Py2 mode *probably* unchanged, cannot find any bugs yet...
* Kept old main function for posterity, for now.

*In utils:*

* Added a few compatibility functions (some string hax required chardet, new dependency):
    - make_compat_bytes(in_str)-> (py3->bytes | py2->str)
    - make_compat_str(in_str)-> (str)
    - compatible_encode_method(bytesorstring, encoding, erraction)-> (str)

*In pdfdevice:*

* To handle different output filetypes in Py3, injected lots of calls to new utils methods,
  as well as some six.PYX checks and logic. These changes are largely responsible for
  enhanced Py2/Py3 consistency.

*In converter:*

* To handle output filetypes in Py2, injected a few checks and fixes particularly around the
  py2 `str.encode` method and its *assumed* usual use-analogies in Py3.
2015-05-17 21:08:57 +01:00
enkore d0379a2c44 Fix utils.decode_text 2014-12-04 17:09:52 +01:00
cybjit 9b2e29396b apply_png_predictor py3 2014-09-16 22:59:29 +02:00
cybjit 51a361c145 clean up HTMLConverter and XMLConverter encoding 2014-09-16 22:57:00 +02:00
unknown 29c07ea770 Python 3.4 support and tests 2014-09-03 15:26:08 +02:00
unknown a6475b61b4 Python 3.4 support added and tested 2014-09-03 13:17:41 +02:00
unknown faea7291a8 tests pass under Py 2.7 and 3.4 2014-09-01 14:16:49 +02:00
Yusuke Shinyama 1ccfaff411 String-Bytes distinction (first attempt). 2014-06-30 19:05:56 +09:00
Yusuke Shinyama 2e900e5d10 Fixed for consistent test results. (hopefully...) 2014-06-26 17:41:31 +09:00
Yusuke Shinyama 0387a6c260 Removed: tuple-unpacking args. 2014-06-15 12:12:13 +09:00
Yusuke Shinyama d9680fca7e Plane: preserve the object order so that the test result is always consistent. 2014-06-14 14:44:53 +09:00
Yusuke Shinyama 340387bfc6 Cleanup: isinstance 2014-03-28 17:50:59 +09:00
Yusuke Shinyama 636d4caeb3 Fixed the PNG predictor bug. Thanks to Gabor Molnar. 2014-03-24 19:57:05 +09:00
Yusuke Shinyama c97ec3048e Changed / to // for clarity. 2013-11-26 21:35:16 +09:00
Yusuke Shinyama c8b6d4112a Fixed: crash with negative layout bbox. 2013-11-09 15:10:14 +09:00
Matthew Duggan 2caa5edc25 PEP8: Whitespace changes to match pep8 2013-11-07 17:35:04 +09:00
Matthew Duggan c1da8b835c PEP8: Remove trailing whitespace 2013-11-07 16:14:53 +09:00
Yusuke Shinyama e927bd307e fixed: https://github.com/euske/pdfminer/issues/8 2013-10-22 18:24:39 +09:00
Yusuke Shinyama 0ea08890d4 renamed: python2 -> python. 2013-10-17 23:05:27 +09:00
Yusuke Shinyama 557c2c72e6 Removed ObjIdRange for terseness. 2013-10-10 18:34:43 +09:00
jcushman da3f023b2d Use set instead of list for Plane's internal collection of objects. 2012-06-22 16:36:33 -03:00
Yusuke Shinyama 46bb0107aa fixed: crash due to small layout elements (thanks to hsoft) 2011-07-31 17:44:09 +10:00
Yusuke Shinyama dc8fde0e47 added CCITTFaxFilter support and a very crude image extraction. 2011-07-18 21:07:00 +10:00
Yusuke Shinyama 0278076ea8 PNG predictor added 2011-06-07 00:46:33 +09:00
Yusuke Shinyama 18a5058af6 separated predictor functions. 2011-06-07 00:31:03 +09:00
Yusuke Shinyama c134596e2f code cleanup and testcase stabilization 2011-05-15 01:22:19 +09:00
Yusuke Shinyama b8d516fc52 extended Plane class. 2011-05-14 14:16:40 +09:00
Yusuke Shinyama 8f9684f6a6 code cleanup: layout analysis 2011-04-21 22:07:04 +09:00
Yusuke Shinyama 18e782f330 canonicalize package names 2011-03-02 23:43:03 +09:00
Yusuke Shinyama bb26cf9180 eliminate empty textboxes 2011-03-01 20:47:20 +09:00
Yusuke Shinyama a8bf9b159e docstring fix 2011-02-27 13:09:12 +09:00
Yusuke Shinyama cabaa10e4f layout analysis improvement 2011-02-27 12:56:28 +09:00
Yusuke Shinyama b2d13db29a code cleanup 2011-02-14 22:51:20 +09:00
yusuke.shinyama.dummy 9584845358 layout analysis improved
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@268 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:05 +00:00
yusuke.shinyama.dummy 1904b61355 documentation
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@266 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:40 +00:00
yusuke.shinyama.dummy 509ab66319 stay with python2
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy 69d9d85685 nunpack TypeError fix
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@246 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:52 +00:00