pdfminer.six

Commit Graph

Author	SHA1	Message	Date
Philippe Guglielmetti	881ea17553	v 20160614	2016-06-14 19:02:07 +02:00
speedplane	dcf07272a1	Revert changes unrelated to this feature.	2016-06-13 23:46:30 -04:00
speedplane	549b560765	Revert changes unrelated to this feature.	2016-06-13 23:44:54 -04:00
speedplane	2049462f6f	Revert changes unrelated to this branch.	2016-06-13 23:42:21 -04:00
speedplane	b0b8818a41	Fix a bug with pdfminer which occurs when two or more filters are applied to a stream, even though no parameters are specified. The code would previously drop all of the streams after the first due to misapplication of the zip function.	2016-06-13 23:35:11 -04:00
Goulu	0d38aa1ff2	Merge pull request #22 from pudo/log-into-namespace Make the logger run in a namespace.	2016-06-09 23:48:52 +02:00
Friedrich Lindenberg	1d54ecd31c	Make the logger run in a namespace.	2016-05-20 21:12:05 +02:00
Goulu	e121f7ec46	Merge pull request #21 from ivanteoh/master Fix issues #20 - NameError: global name 'ImageWriter' is not defined	2016-05-01 20:09:10 +02:00
Ivan Teoh	2c8f226907	Fix issues #20 - NameError: global name 'ImageWriter' is not defined	2016-04-26 12:38:42 +10:00
Philippe Guglielmetti	21fd2bbd23	v 20160202 with Py 2.6 & Py 3.5 support	2016-02-02 15:38:51 +01:00
Goulu	5f888fe3fb	Merge pull request #17 from orangain/ensure-lf Ensure that command line tools use LF line endings to work on Linux/OS X	2016-02-02 15:25:45 +01:00
orangain	5a2e342a46	Add .gitattributes to always checkout *.py files with LF line endings	2016-01-25 14:27:01 +09:00
Goulu	5a23fad6fd	Merge pull request #14 from orangain/close-device Close device to write footer of xml/html files	2016-01-18 11:22:35 +01:00
Goulu	2103e5875e	Merge pull request #13 from orangain/include-cmap Include compiled cmap resources to simplify installation for CJK languages	2016-01-18 11:22:08 +01:00
Goulu	4f762cb897	Merge pull request #16 from stevenhair/settings-management Improved settings management	2016-01-18 11:21:26 +01:00
Steve Hair	92c71436b9	Improved settings management	2016-01-10 12:17:38 -05:00
orangain	f8a051adbd	Close device to write footer of xml/html files	2015-12-27 20:57:00 +09:00
orangain	f1d5d681b6	Include compiled cmap resources to simplify installation for CJK languages * Run `make cmap` and `git add pdfminer/cmap`. * Modify MANIFEST.in not to include cmaprsrc dir in the sdist package. * Add pdfminer/cmap/README.txt to include license in the sdist package. * Remove installation guide specific to CJK languages from README.md.	2015-12-27 13:32:29 +09:00
lucanaso	63bb3caec2	Fixed for rendering non breaking spaces (cid:160) As stated in the PDF specification ISO 32000-1, table in Annex D.2 Latin Character Set and Encodings page 653 to 656 (available here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf): "The SPACE character shall also be encoded as 312 in MacRomanEncoding and as 240 in WinAnsiEncoding. This duplicate code shall signify a nonbreaking space; it shall be typographically the same as (U+003A) SPACE." The duplicate key was missing, therefore PDFMiner was returning the string "(cid:160)". This fix adds the duplicate key in latin_enc.py glyphlist.py does not need to be modified as it already contains a key for non breaking space https://github.com/lucanaso/pdfminer/blob/master/pdfminer/glyphlist.py#L2755.	2015-12-09 16:47:32 +01:00
Goulu	72b2bc3197	Merge pull request #11 from metachris/pdfminerX Pdfminer Updates	2015-12-06 18:56:53 +01:00
Chris Hager	8149be1669	bugfixes	2015-12-06 00:17:58 +01:00
Chris Hager	a9a026b796	Merge remote-tracking branch 'origin/patch-1' * origin/patch-1: Updated setup.py to work with Python 2.6	2015-12-06 00:13:31 +01:00
Chris Hager	146abb459f	Updated setup.py to work with Python 2.6 Simple fix. Mind to add and push to PyPi?	2015-11-08 02:32:23 +01:00
Chris Hager	2e1be5721f	removed settings.ENFORCE_CHECK_EXTRACTABLE	2015-11-01 22:34:18 +01:00
Chris Hager	b686dd0139	pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE	2015-11-01 22:28:08 +01:00
Goulu	a46ea52e20	Merge pull request #7 from orangain/install_requires Ensure to install required libraries on installation	2015-08-11 12:38:15 +02:00
Ivan Pozdeev	63c9378b8b	make ValueError's descriptive	2015-08-10 03:14:51 +03:00
orangain	e143ad7ba8	Ensure to install required libraries on installation	2015-08-06 20:55:57 +09:00
Goulu	bc8d631a7c	Merge pull request #6 from GreenLightGo/hotfix/strict-setting change STRICT to be a settings attribute	2015-07-21 10:43:39 +02:00
Alex Zagorodniuk	131cb1ea92	change STRICT to be a settings attribute	2015-06-22 19:08:35 -04:00
Pablo Castellano	9af4fe85e1	README: Changed line about Python 3 support	2015-06-14 17:02:12 +02:00
Goulu	623bd98452	Update __init__.py version 20150601	2015-06-01 10:21:51 +02:00
Goulu	30e14ddf65	Merge pull request #5 from cathalgarvey/master Lots of changes to improve compatibility and modularity	2015-06-01 10:18:49 +02:00
Cathal Garvey	e2d3adc8c1	Adding chardet to Travis	2015-05-30 19:35:05 +01:00
Cathal Garvey	403711ed13	Whoops, forgot to version-gate chardet in the actual code. Thanks Travis!	2015-05-30 19:33:35 +01:00
Cathal Garvey	a2ad7a6d03	Fixed some bugs preventing all tests from passing in Py2.	2015-05-30 18:02:29 +01:00
Cathal Garvey	79c97ac221	Docstrings.	2015-05-30 17:16:06 +01:00
Cathal Garvey	268e9fb2bd	Removed typechecking, nothing's exploded yet and argparse does lots of heavy lifting already.	2015-05-30 17:05:28 +01:00
Cathal Garvey	3b7edba48c	Forgot to add the actual compartmentalised function..	2015-05-30 17:04:28 +01:00
Cathal Garvey	b3553cef10	Cleaning up pdf2txt.py after the partition/move.	2015-05-30 17:03:55 +01:00
Cathal Garvey	cbe270a4bf	Killed the old main function for pdf2txt.py	2015-05-30 16:37:22 +01:00
Cathal Garvey	ead8e778a6	Successfully compartmentalised code, getting closer to moving pdf->text as a module function.	2015-05-30 16:27:58 +01:00
Cathal Garvey	08cb217983	Progress, progress.. not nearly atomic enough, sorry.	2015-05-30 16:14:24 +01:00
Cathal Garvey	1b47bed306	Many changes to make pdf2txt.py work better in Py3, some in that script, others in module! Sorry, changes should have been more atomic. In pdf2txt.py: * Re-wrote main function to use argparse instead of optparse. * Manually tested in Py2/Py3 to get partial consistency. * Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway. * Py2 mode probably unchanged, cannot find any bugs yet... * Kept old main function for posterity, for now. In utils: * Added a few compatibility functions (some string hax required chardet, new dependency): - make_compat_bytes(in_str)-> (py3->bytes \| py2->str) - make_compat_str(in_str)-> (str) - compatible_encode_method(bytesorstring, encoding, erraction)-> (str) In pdfdevice: * To handle different output filetypes in Py3, injected lots of calls to new utils methods, as well as some six.PYX checks and logic. These changes are largely responsible for enhanced Py2/Py3 consistency. In converter: * To handle output filetypes in Py2, injected a few checks and fixes particularly around the py2 `str.encode` method and its assumed usual use-analogies in Py3.	2015-05-17 21:08:57 +01:00
Yusuke Shinyama	14fd0fd2d6	Fixed: #84 (fontname was in unicode)	2015-04-05 19:02:02 +09:00
Ashley Blackmore	1dbe9ff7e7	Update setup.py Install missing pycrypto lib	2015-02-18 18:35:53 +01:00
speedplane	5609418351	Add gz to gitignore.	2014-12-14 01:29:39 -05:00
speedplane	69afd3dd30	Use a .gitignore file.	2014-12-14 01:23:44 -05:00
speedplane	2199c25493	Add my own .gitignore.	2014-12-12 00:37:54 -05:00
speedplane	806ee603ff	More fixes to layout. The compute neighbors function for horizontal lines is only intended to find neighbors on differing lines. However, it's entirely possible that horizontal neighbors could appear. This commit finds horizontal neighbors in a horizonal line and merges them together into a single horizontal line if necessary. This leads to much better text extraction if the PDF was created in a funky way. For example (test case coming), I have seen PDFs which are written almost like vertical columns, but the text is entirely horizontal.	2014-12-12 00:36:59 -05:00

... 3 4 5 6 7 ...

770 Commits (347c125fb88d0b62d9e0c599e266bf8ca47d915a) All Branches Search

770 Commits (347c125fb88d0b62d9e0c599e266bf8ca47d915a)

All Branches