Commit Graph

826 Commits (391fe149ca87b9f9688e0f94fe0b9bd15867b157)

Author SHA1 Message Date
Yusuke Shinyama b6e2d4888f Merge pull request #139 from Daniel-KM/fix-tests
Fixed tests.
2016-09-11 23:46:48 +09:00
Yusuke Shinyama 64fe538b24 Fixed: #114 (UnicodeEncodeError in PSLiteral) 2016-09-11 23:43:22 +09:00
Yusuke Shinyama 647a6c653c Added: LICENSE 2016-09-11 23:38:18 +09:00
Vinayak Mehta 2926002017 Replace old Adobe glyphlist link 2016-09-08 16:34:53 +05:30
Daniel Berthereau 10815bff7b Fixed tests. 2016-06-27 00:00:00 +02:00
Daniel Berthereau 395cdd7538 Fixed tests. 2016-06-27 00:00:00 +02:00
Philippe Guglielmetti 881ea17553 v 20160614 2016-06-14 19:02:07 +02:00
speedplane dcf07272a1 Revert changes unrelated to this feature. 2016-06-13 23:46:30 -04:00
speedplane 549b560765 Revert changes unrelated to this feature. 2016-06-13 23:44:54 -04:00
speedplane 2049462f6f Revert changes unrelated to this branch. 2016-06-13 23:42:21 -04:00
speedplane b0b8818a41 Fix a bug with pdfminer which occurs when two or more filters are applied to a stream, even though no parameters are specified. The code would previously drop all of the streams after the first due to misapplication of the zip function. 2016-06-13 23:35:11 -04:00
Goulu 0d38aa1ff2 Merge pull request #22 from pudo/log-into-namespace
Make the logger run in a namespace.
2016-06-09 23:48:52 +02:00
Friedrich Lindenberg 1d54ecd31c Make the logger run in a namespace. 2016-05-20 21:12:05 +02:00
Goulu e121f7ec46 Merge pull request #21 from ivanteoh/master
Fix issues #20 - NameError: global name 'ImageWriter' is not defined
2016-05-01 20:09:10 +02:00
Ivan Teoh 2c8f226907 Fix issues #20 - NameError: global name 'ImageWriter' is not defined 2016-04-26 12:38:42 +10:00
Philippe Guglielmetti 21fd2bbd23 v 20160202 with Py 2.6 & Py 3.5 support 2016-02-02 15:38:51 +01:00
Goulu 5f888fe3fb Merge pull request #17 from orangain/ensure-lf
Ensure that command line tools use LF line endings to work on Linux/OS X
2016-02-02 15:25:45 +01:00
orangain 5a2e342a46 Add .gitattributes to always checkout *.py files with LF line endings 2016-01-25 14:27:01 +09:00
Goulu 5a23fad6fd Merge pull request #14 from orangain/close-device
Close device to write footer of xml/html files
2016-01-18 11:22:35 +01:00
Goulu 2103e5875e Merge pull request #13 from orangain/include-cmap
Include compiled cmap resources to simplify installation for CJK languages
2016-01-18 11:22:08 +01:00
Goulu 4f762cb897 Merge pull request #16 from stevenhair/settings-management
Improved settings management
2016-01-18 11:21:26 +01:00
Steve Hair 92c71436b9 Improved settings management 2016-01-10 12:17:38 -05:00
orangain f8a051adbd Close device to write footer of xml/html files 2015-12-27 20:57:00 +09:00
orangain f1d5d681b6 Include compiled cmap resources to simplify installation for CJK languages
* Run `make cmap` and `git add pdfminer/cmap`.
* Modify MANIFEST.in not to include cmaprsrc dir in the sdist package.
* Add pdfminer/cmap/README.txt to include license in the sdist package.
* Remove installation guide specific to CJK languages from README.md.
2015-12-27 13:32:29 +09:00
lucanaso 63bb3caec2 Fixed for rendering non breaking spaces (cid:160)
As stated in the PDF specification ISO 32000-1, table in Annex D.2 Latin Character Set and Encodings page 653 to 656 (available here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf):
"The SPACE character shall also be encoded as 312 in MacRomanEncoding and as 240 in WinAnsiEncoding. This duplicate code shall signify a nonbreaking space; it shall be typographically the same as (U+003A) SPACE."
The duplicate key was missing, therefore PDFMiner was returning the string "(cid:160)". 

This fix adds the duplicate key in latin_enc.py
glyphlist.py does not need to be modified as it already contains a key for non breaking space https://github.com/lucanaso/pdfminer/blob/master/pdfminer/glyphlist.py#L2755.
2015-12-09 16:47:32 +01:00
Goulu 72b2bc3197 Merge pull request #11 from metachris/pdfminerX
Pdfminer Updates
2015-12-06 18:56:53 +01:00
Chris Hager 8149be1669 bugfixes 2015-12-06 00:17:58 +01:00
Chris Hager a9a026b796 Merge remote-tracking branch 'origin/patch-1'
* origin/patch-1:
  Updated setup.py to work with Python 2.6
2015-12-06 00:13:31 +01:00
Chris Hager 146abb459f Updated setup.py to work with Python 2.6
Simple fix. Mind to add and push to PyPi?
2015-11-08 02:32:23 +01:00
Chris Hager 2e1be5721f removed settings.ENFORCE_CHECK_EXTRACTABLE 2015-11-01 22:34:18 +01:00
Chris Hager b686dd0139 pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE 2015-11-01 22:28:08 +01:00
Goulu a46ea52e20 Merge pull request #7 from orangain/install_requires
Ensure to install required libraries on installation
2015-08-11 12:38:15 +02:00
Ivan Pozdeev 63c9378b8b make ValueError's descriptive 2015-08-10 03:14:51 +03:00
orangain e143ad7ba8 Ensure to install required libraries on installation 2015-08-06 20:55:57 +09:00
Goulu bc8d631a7c Merge pull request #6 from GreenLightGo/hotfix/strict-setting
change STRICT to be a settings attribute
2015-07-21 10:43:39 +02:00
Alex Zagorodniuk 131cb1ea92 change STRICT to be a settings attribute 2015-06-22 19:08:35 -04:00
Pablo Castellano 9af4fe85e1 README: Changed line about Python 3 support 2015-06-14 17:02:12 +02:00
Goulu 623bd98452 Update __init__.py
version 20150601
2015-06-01 10:21:51 +02:00
Goulu 30e14ddf65 Merge pull request #5 from cathalgarvey/master
Lots of changes to improve compatibility and modularity
2015-06-01 10:18:49 +02:00
Cathal Garvey e2d3adc8c1 Adding chardet to Travis 2015-05-30 19:35:05 +01:00
Cathal Garvey 403711ed13 Whoops, forgot to version-gate chardet in the actual code. Thanks Travis! 2015-05-30 19:33:35 +01:00
Cathal Garvey a2ad7a6d03 Fixed some bugs preventing all tests from passing in Py2. 2015-05-30 18:02:29 +01:00
Cathal Garvey 79c97ac221 Docstrings. 2015-05-30 17:16:06 +01:00
Cathal Garvey 268e9fb2bd Removed typechecking, nothing's exploded yet and argparse does lots of heavy lifting already. 2015-05-30 17:05:28 +01:00
Cathal Garvey 3b7edba48c Forgot to add the actual compartmentalised function.. 2015-05-30 17:04:28 +01:00
Cathal Garvey b3553cef10 Cleaning up pdf2txt.py after the partition/move. 2015-05-30 17:03:55 +01:00
Cathal Garvey cbe270a4bf Killed the old main function for pdf2txt.py 2015-05-30 16:37:22 +01:00
Cathal Garvey ead8e778a6 Successfully compartmentalised code, getting closer to moving pdf->text as a module function. 2015-05-30 16:27:58 +01:00
Cathal Garvey 08cb217983 Progress, progress.. not nearly atomic enough, sorry. 2015-05-30 16:14:24 +01:00
Cathal Garvey 1b47bed306 Many changes to make pdf2txt.py work better in Py3, some in that script, others in module!
Sorry, changes should have been more atomic.

*In pdf2txt.py:*

* Re-wrote main function to use argparse instead of optparse.
* Manually tested in Py2/Py3 to get partial consistency.
* Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway.
* Py2 mode *probably* unchanged, cannot find any bugs yet...
* Kept old main function for posterity, for now.

*In utils:*

* Added a few compatibility functions (some string hax required chardet, new dependency):
    - make_compat_bytes(in_str)-> (py3->bytes | py2->str)
    - make_compat_str(in_str)-> (str)
    - compatible_encode_method(bytesorstring, encoding, erraction)-> (str)

*In pdfdevice:*

* To handle different output filetypes in Py3, injected lots of calls to new utils methods,
  as well as some six.PYX checks and logic. These changes are largely responsible for
  enhanced Py2/Py3 consistency.

*In converter:*

* To handle output filetypes in Py2, injected a few checks and fixes particularly around the
  py2 `str.encode` method and its *assumed* usual use-analogies in Py3.
2015-05-17 21:08:57 +01:00