pdfminer.six

Commit Graph

Author	SHA1	Message	Date
Ivan Pozdeev	63c9378b8b	make ValueError's descriptive	2015-08-10 03:14:51 +03:00
orangain	e143ad7ba8	Ensure to install required libraries on installation	2015-08-06 20:55:57 +09:00
Goulu	bc8d631a7c	Merge pull request #6 from GreenLightGo/hotfix/strict-setting change STRICT to be a settings attribute	2015-07-21 10:43:39 +02:00
Alex Zagorodniuk	131cb1ea92	change STRICT to be a settings attribute	2015-06-22 19:08:35 -04:00
Pablo Castellano	9af4fe85e1	README: Changed line about Python 3 support	2015-06-14 17:02:12 +02:00
Goulu	623bd98452	Update __init__.py version 20150601	2015-06-01 10:21:51 +02:00
Goulu	30e14ddf65	Merge pull request #5 from cathalgarvey/master Lots of changes to improve compatibility and modularity	2015-06-01 10:18:49 +02:00
Cathal Garvey	e2d3adc8c1	Adding chardet to Travis	2015-05-30 19:35:05 +01:00
Cathal Garvey	403711ed13	Whoops, forgot to version-gate chardet in the actual code. Thanks Travis!	2015-05-30 19:33:35 +01:00
Cathal Garvey	a2ad7a6d03	Fixed some bugs preventing all tests from passing in Py2.	2015-05-30 18:02:29 +01:00
Cathal Garvey	79c97ac221	Docstrings.	2015-05-30 17:16:06 +01:00
Cathal Garvey	268e9fb2bd	Removed typechecking, nothing's exploded yet and argparse does lots of heavy lifting already.	2015-05-30 17:05:28 +01:00
Cathal Garvey	3b7edba48c	Forgot to add the actual compartmentalised function..	2015-05-30 17:04:28 +01:00
Cathal Garvey	b3553cef10	Cleaning up pdf2txt.py after the partition/move.	2015-05-30 17:03:55 +01:00
Cathal Garvey	cbe270a4bf	Killed the old main function for pdf2txt.py	2015-05-30 16:37:22 +01:00
Cathal Garvey	ead8e778a6	Successfully compartmentalised code, getting closer to moving pdf->text as a module function.	2015-05-30 16:27:58 +01:00
Cathal Garvey	08cb217983	Progress, progress.. not nearly atomic enough, sorry.	2015-05-30 16:14:24 +01:00
Cathal Garvey	1b47bed306	Many changes to make pdf2txt.py work better in Py3, some in that script, others in module! Sorry, changes should have been more atomic. In pdf2txt.py: * Re-wrote main function to use argparse instead of optparse. * Manually tested in Py2/Py3 to get partial consistency. * Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway. * Py2 mode probably unchanged, cannot find any bugs yet... * Kept old main function for posterity, for now. In utils: * Added a few compatibility functions (some string hax required chardet, new dependency): - make_compat_bytes(in_str)-> (py3->bytes \| py2->str) - make_compat_str(in_str)-> (str) - compatible_encode_method(bytesorstring, encoding, erraction)-> (str) In pdfdevice: * To handle different output filetypes in Py3, injected lots of calls to new utils methods, as well as some six.PYX checks and logic. These changes are largely responsible for enhanced Py2/Py3 consistency. In converter: * To handle output filetypes in Py2, injected a few checks and fixes particularly around the py2 `str.encode` method and its assumed usual use-analogies in Py3.	2015-05-17 21:08:57 +01:00
Yusuke Shinyama	14fd0fd2d6	Fixed: #84 (fontname was in unicode)	2015-04-05 19:02:02 +09:00
Ashley Blackmore	1dbe9ff7e7	Update setup.py Install missing pycrypto lib	2015-02-18 18:35:53 +01:00
speedplane	5609418351	Add gz to gitignore.	2014-12-14 01:29:39 -05:00
speedplane	69afd3dd30	Use a .gitignore file.	2014-12-14 01:23:44 -05:00
speedplane	2199c25493	Add my own .gitignore.	2014-12-12 00:37:54 -05:00
speedplane	806ee603ff	More fixes to layout. The compute neighbors function for horizontal lines is only intended to find neighbors on differing lines. However, it's entirely possible that horizontal neighbors could appear. This commit finds horizontal neighbors in a horizonal line and merges them together into a single horizontal line if necessary. This leads to much better text extraction if the PDF was created in a funky way. For example (test case coming), I have seen PDFs which are written almost like vertical columns, but the text is entirely horizontal.	2014-12-12 00:36:59 -05:00
speedplane	45170e7183	There are a number of relatively complex changes here. Comments are in order of where the change appears. 1. When detecting text in a horizontal line, we already add a space between words if separated by more than word_margin apart. However now, we only do it if there is not already an existing space. This prevents multiple spaces being placed between words. 2. Detect a horizontal line if the line is zero width. This improves our detection of horizonal lines when looking for both horizontal and vertical. 3. Don't detect a vertical line if the previous letter is whitspace. Prevents double spaces being caught as vert lines. 4. Improve upon an unfortunate O(N^2) algorithm which I have seen taking many minutes to execute. Unfortunately, while the "fix" reduces algorithmic complexity, it isn't technically correct, so we only do it when we know things will take a long time.	2014-12-12 00:36:59 -05:00
speedplane	c32550dd4a	Merge branch 'fix-makefile'	2014-12-11 00:54:14 -05:00
speedplane	5cbdd915c7	Remove the dependancy on python2. Also, allow tests to be run on cygwin by checking for it, and converting unix2dos line endings.	2014-12-11 00:53:33 -05:00
speedplane	830b2403e2	Merge branch 'euske-main/master'	2014-12-11 00:06:46 -05:00
Yusuke Shinyama	0112112458	Fixed: crash on invalid chr number.	2014-12-09 22:55:47 +09:00
Yusuke Shinyama	75206ba18d	Removed: .gitignore	2014-12-09 22:49:13 +09:00
Yusuke Shinyama	4b585221e2	Merge pull request #76 from speedplane/master Fix Unicode Bug + Add GitIgnore + Add Debug Flags	2014-12-09 22:22:33 +09:00
Philippe Guglielmetti	448aa08bc4	Merge pull request #4 from enkore/master Fix utils.decode_text	2014-12-05 09:58:58 +01:00
enkore	d0379a2c44	Fix utils.decode_text	2014-12-04 17:09:52 +01:00
speedplane	36977fbe08	Add debug flags for much of the debug output.	2014-11-11 23:36:58 -05:00
speedplane	1067cb9f9f	Use a .gitignore file.	2014-11-11 23:36:26 -05:00
speedplane	ecc4d05675	Fix a unicode conversion bug. See https://github.com/euske/pdfminer/issues/75	2014-11-11 23:34:33 -05:00
Philippe Guglielmetti	0e40264071	Merge pull request #3 from Cybjit/master Samples and latin1 passwords	2014-09-17 07:22:52 +02:00
cybjit	515687e1bb	more xrange to range	2014-09-16 23:17:31 +02:00
cybjit	2639b15ef4	guess argv encoding in py2 using sys.stdin.encoding	2014-09-16 23:17:26 +02:00
cybjit	9b2e29396b	apply_png_predictor py3	2014-09-16 22:59:29 +02:00
cybjit	ad05121c69	password py3	2014-09-16 22:59:00 +02:00
cybjit	14585987c3	keep password api unicode, latin1 or utf-8 is encoded in handler	2014-09-16 22:58:25 +02:00
cybjit	2260f77b19	fix dict_value usage in strict mode	2014-09-16 22:57:29 +02:00
cybjit	51a361c145	clean up HTMLConverter and XMLConverter encoding	2014-09-16 22:57:00 +02:00
cybjit	2ee7153f6e	add python3 in sample Makefile	2014-09-16 22:56:13 +02:00
Goulu	f577f76c52	renamed as pdfminer.six in PyPi	2014-09-15 11:10:00 +02:00
Goulu	03de0f4db8	forgot 'six' requirement ...	2014-09-15 10:42:08 +02:00
Goulu	8861d7e0ed	version 20140915 pushed to PyPi as pdfminer_six	2014-09-15 10:33:04 +02:00
Philippe Guglielmetti	4f8aa9ff5b	Merge pull request #2 from Cybjit/master CMap fixes and speed improvements	2014-09-12 07:33:06 +02:00
cybjit	714423883c	setup logging for pdf2txt and fix dumppdf	2014-09-12 00:29:31 +02:00

... 4 5 6 7 8 ...

794 Commits (410d7ecac304100b6d2c2a08aeb3b80510dbcb96) All Branches Search

794 Commits (410d7ecac304100b6d2c2a08aeb3b80510dbcb96)

All Branches