pdfminer.six

Commit Graph

Author	SHA1	Message	Date
speedplane	2049462f6f	Revert changes unrelated to this branch.	2016-06-13 23:42:21 -04:00
speedplane	b0b8818a41	Fix a bug with pdfminer which occurs when two or more filters are applied to a stream, even though no parameters are specified. The code would previously drop all of the streams after the first due to misapplication of the zip function.	2016-06-13 23:35:11 -04:00
Goulu	0d38aa1ff2	Merge pull request #22 from pudo/log-into-namespace Make the logger run in a namespace.	2016-06-09 23:48:52 +02:00
Friedrich Lindenberg	1d54ecd31c	Make the logger run in a namespace.	2016-05-20 21:12:05 +02:00
Goulu	e121f7ec46	Merge pull request #21 from ivanteoh/master Fix issues #20 - NameError: global name 'ImageWriter' is not defined	2016-05-01 20:09:10 +02:00
Ivan Teoh	2c8f226907	Fix issues #20 - NameError: global name 'ImageWriter' is not defined	2016-04-26 12:38:42 +10:00
Philippe Guglielmetti	21fd2bbd23	v 20160202 with Py 2.6 & Py 3.5 support	2016-02-02 15:38:51 +01:00
Goulu	5f888fe3fb	Merge pull request #17 from orangain/ensure-lf Ensure that command line tools use LF line endings to work on Linux/OS X	2016-02-02 15:25:45 +01:00
orangain	5a2e342a46	Add .gitattributes to always checkout *.py files with LF line endings	2016-01-25 14:27:01 +09:00
Goulu	5a23fad6fd	Merge pull request #14 from orangain/close-device Close device to write footer of xml/html files	2016-01-18 11:22:35 +01:00
Goulu	2103e5875e	Merge pull request #13 from orangain/include-cmap Include compiled cmap resources to simplify installation for CJK languages	2016-01-18 11:22:08 +01:00
Goulu	4f762cb897	Merge pull request #16 from stevenhair/settings-management Improved settings management	2016-01-18 11:21:26 +01:00
Steve Hair	92c71436b9	Improved settings management	2016-01-10 12:17:38 -05:00
orangain	f8a051adbd	Close device to write footer of xml/html files	2015-12-27 20:57:00 +09:00
orangain	f1d5d681b6	Include compiled cmap resources to simplify installation for CJK languages * Run `make cmap` and `git add pdfminer/cmap`. * Modify MANIFEST.in not to include cmaprsrc dir in the sdist package. * Add pdfminer/cmap/README.txt to include license in the sdist package. * Remove installation guide specific to CJK languages from README.md.	2015-12-27 13:32:29 +09:00
lucanaso	63bb3caec2	Fixed for rendering non breaking spaces (cid:160) As stated in the PDF specification ISO 32000-1, table in Annex D.2 Latin Character Set and Encodings page 653 to 656 (available here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf): "The SPACE character shall also be encoded as 312 in MacRomanEncoding and as 240 in WinAnsiEncoding. This duplicate code shall signify a nonbreaking space; it shall be typographically the same as (U+003A) SPACE." The duplicate key was missing, therefore PDFMiner was returning the string "(cid:160)". This fix adds the duplicate key in latin_enc.py glyphlist.py does not need to be modified as it already contains a key for non breaking space https://github.com/lucanaso/pdfminer/blob/master/pdfminer/glyphlist.py#L2755.	2015-12-09 16:47:32 +01:00
Goulu	72b2bc3197	Merge pull request #11 from metachris/pdfminerX Pdfminer Updates	2015-12-06 18:56:53 +01:00
Chris Hager	8149be1669	bugfixes	2015-12-06 00:17:58 +01:00
Chris Hager	a9a026b796	Merge remote-tracking branch 'origin/patch-1' * origin/patch-1: Updated setup.py to work with Python 2.6	2015-12-06 00:13:31 +01:00
Chris Hager	146abb459f	Updated setup.py to work with Python 2.6 Simple fix. Mind to add and push to PyPi?	2015-11-08 02:32:23 +01:00
Chris Hager	2e1be5721f	removed settings.ENFORCE_CHECK_EXTRACTABLE	2015-11-01 22:34:18 +01:00
Chris Hager	b686dd0139	pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE	2015-11-01 22:28:08 +01:00
Goulu	a46ea52e20	Merge pull request #7 from orangain/install_requires Ensure to install required libraries on installation	2015-08-11 12:38:15 +02:00
Ivan Pozdeev	63c9378b8b	make ValueError's descriptive	2015-08-10 03:14:51 +03:00
orangain	e143ad7ba8	Ensure to install required libraries on installation	2015-08-06 20:55:57 +09:00
Goulu	bc8d631a7c	Merge pull request #6 from GreenLightGo/hotfix/strict-setting change STRICT to be a settings attribute	2015-07-21 10:43:39 +02:00
Alex Zagorodniuk	131cb1ea92	change STRICT to be a settings attribute	2015-06-22 19:08:35 -04:00
Pablo Castellano	9af4fe85e1	README: Changed line about Python 3 support	2015-06-14 17:02:12 +02:00
Goulu	623bd98452	Update __init__.py version 20150601	2015-06-01 10:21:51 +02:00
Goulu	30e14ddf65	Merge pull request #5 from cathalgarvey/master Lots of changes to improve compatibility and modularity	2015-06-01 10:18:49 +02:00
Cathal Garvey	e2d3adc8c1	Adding chardet to Travis	2015-05-30 19:35:05 +01:00
Cathal Garvey	403711ed13	Whoops, forgot to version-gate chardet in the actual code. Thanks Travis!	2015-05-30 19:33:35 +01:00
Cathal Garvey	a2ad7a6d03	Fixed some bugs preventing all tests from passing in Py2.	2015-05-30 18:02:29 +01:00
Cathal Garvey	79c97ac221	Docstrings.	2015-05-30 17:16:06 +01:00
Cathal Garvey	268e9fb2bd	Removed typechecking, nothing's exploded yet and argparse does lots of heavy lifting already.	2015-05-30 17:05:28 +01:00
Cathal Garvey	3b7edba48c	Forgot to add the actual compartmentalised function..	2015-05-30 17:04:28 +01:00
Cathal Garvey	b3553cef10	Cleaning up pdf2txt.py after the partition/move.	2015-05-30 17:03:55 +01:00
Cathal Garvey	cbe270a4bf	Killed the old main function for pdf2txt.py	2015-05-30 16:37:22 +01:00
Cathal Garvey	ead8e778a6	Successfully compartmentalised code, getting closer to moving pdf->text as a module function.	2015-05-30 16:27:58 +01:00
Cathal Garvey	08cb217983	Progress, progress.. not nearly atomic enough, sorry.	2015-05-30 16:14:24 +01:00
Cathal Garvey	1b47bed306	Many changes to make pdf2txt.py work better in Py3, some in that script, others in module! Sorry, changes should have been more atomic. In pdf2txt.py: * Re-wrote main function to use argparse instead of optparse. * Manually tested in Py2/Py3 to get partial consistency. * Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway. * Py2 mode probably unchanged, cannot find any bugs yet... * Kept old main function for posterity, for now. In utils: * Added a few compatibility functions (some string hax required chardet, new dependency): - make_compat_bytes(in_str)-> (py3->bytes \| py2->str) - make_compat_str(in_str)-> (str) - compatible_encode_method(bytesorstring, encoding, erraction)-> (str) In pdfdevice: * To handle different output filetypes in Py3, injected lots of calls to new utils methods, as well as some six.PYX checks and logic. These changes are largely responsible for enhanced Py2/Py3 consistency. In converter: * To handle output filetypes in Py2, injected a few checks and fixes particularly around the py2 `str.encode` method and its assumed usual use-analogies in Py3.	2015-05-17 21:08:57 +01:00
Yusuke Shinyama	14fd0fd2d6	Fixed: #84 (fontname was in unicode)	2015-04-05 19:02:02 +09:00
Ashley Blackmore	1dbe9ff7e7	Update setup.py Install missing pycrypto lib	2015-02-18 18:35:53 +01:00
speedplane	5609418351	Add gz to gitignore.	2014-12-14 01:29:39 -05:00
speedplane	69afd3dd30	Use a .gitignore file.	2014-12-14 01:23:44 -05:00
speedplane	2199c25493	Add my own .gitignore.	2014-12-12 00:37:54 -05:00
speedplane	806ee603ff	More fixes to layout. The compute neighbors function for horizontal lines is only intended to find neighbors on differing lines. However, it's entirely possible that horizontal neighbors could appear. This commit finds horizontal neighbors in a horizonal line and merges them together into a single horizontal line if necessary. This leads to much better text extraction if the PDF was created in a funky way. For example (test case coming), I have seen PDFs which are written almost like vertical columns, but the text is entirely horizontal.	2014-12-12 00:36:59 -05:00
speedplane	45170e7183	There are a number of relatively complex changes here. Comments are in order of where the change appears. 1. When detecting text in a horizontal line, we already add a space between words if separated by more than word_margin apart. However now, we only do it if there is not already an existing space. This prevents multiple spaces being placed between words. 2. Detect a horizontal line if the line is zero width. This improves our detection of horizonal lines when looking for both horizontal and vertical. 3. Don't detect a vertical line if the previous letter is whitspace. Prevents double spaces being caught as vert lines. 4. Improve upon an unfortunate O(N^2) algorithm which I have seen taking many minutes to execute. Unfortunately, while the "fix" reduces algorithmic complexity, it isn't technically correct, so we only do it when we know things will take a long time.	2014-12-12 00:36:59 -05:00
speedplane	c32550dd4a	Merge branch 'fix-makefile'	2014-12-11 00:54:14 -05:00
speedplane	5cbdd915c7	Remove the dependancy on python2. Also, allow tests to be run on cygwin by checking for it, and converting unix2dos line endings.	2014-12-11 00:53:33 -05:00

... 5 6 7 8 9 ...

867 Commits (8ea9f1091a7eef307a80483fbdc6265e1fcf925f) All Branches Search

867 Commits (8ea9f1091a7eef307a80483fbdc6265e1fcf925f)

All Branches