pdfminer.six

Commit Graph

Author	SHA1	Message	Date
Pieter Marsman	2bee7d8dcf	Fix wrong ordering of grouping textboxes introduced by #315 . The first grouping of textboxes should be skipped if there are intermediate textboxes. (#335 ) Fixes #334	2019-11-10 12:18:49 +01:00
Pieter Marsman	bc034c8e59	Create sphinx documentation for Read the Docs (#329 ) Fixes #171 Fixes #199 Fixes #118 Fixes #178 Added: tests for building documentation and example code in documentation Added: docstrings for common used functions and classes Removed: old documentation	2019-11-07 21:12:34 +01:00
Jianfeng	44b223cf0a	Speedup grouping of textboxes (#315 ) Changed: using a heap instead of a SortedList and avoid rebuilding the heap in each iteration Changed: avoid potentially huge number of variable assignments in list comprehension. Changed: avoid repeatly evaluating `obj is obj` in list comprehension by storing id(obj).	2019-10-31 09:22:58 +01:00
Tata Ganesh	e03ecab856	Merge pull request #141 from timb07/speedup_layout Speed up layout of text boxes	2018-11-08 20:28:40 +05:30
Tim Bell	1cbeaebfce	Fix Python 2.6 incompatibility	2018-04-11 10:34:15 +10:00
Tim Bell	0c8cf748fe	Fix copy-paste error	2018-04-11 10:15:32 +10:00
Tim Bell	2dda2b12b4	Speedup layout with .sort() and sortedcontainers.SortedListWithKey()	2018-04-11 09:03:32 +10:00
Quentin Pradet	2231f0892e	Send non-stroke color to XML conversion Inspired by https://github.com/euske/pdfminer/pull/158 from @andruo11 and https://github.com/euske/pdfminer/pull/197 from @staccatosound.	2018-03-06 14:11:48 +04:00
Hugh Secker-Walker	488545ddc7	Add string expressions to asserts showing local data (#67 )	2017-05-29 09:06:09 +02:00
Philippe Guglielmetti	52feb22eeb	Merge remote-tracking branch 'origin/master' Conflicts: MANIFEST.in README.md pdfminer/latin_enc.py pdfminer/pdfdocument.py pdfminer/pdfinterp.py pdfminer/pdfpage.py pdfminer/pdftypes.py pdfminer/psparser.py pdfminer/utils.py samples/Makefile setup.py	2017-01-19 08:03:16 +01:00
Humberto Pereira	e6ad15af79	Added painting information (#37 ) * added color support to stroking and non stroking color spaces * extended LTCurve, LTLine and LTRect to save painting information * modified PDFLayoutAnalyzer to populate the shapes with painting information	2016-11-08 20:01:58 +01:00
Antonio Ercole De Luca	0fdebc6739	Removing all the "#!/usr/bin/env python" lines, they do not need for … (#34 ) * Removing all the "#!/usr/bin/env python" lines, they do not need for python3, solving issue number: #19. * Restored all the shebangs in the tools and tests folders (because they are real executables) but used "#!/usr/bin/env python" instead of "#!/usr/bin/python" as this blog points out: https://www.peterbe.com/plog/importance-of-env Removed also the shebang from pdfminer/psparser.py file.	2016-11-08 20:01:11 +01:00
Yusuke Shinyama	8150458718	Added: a simpler ordering mode when 1<F.	2016-09-26 18:06:34 +09:00
speedplane	2049462f6f	Revert changes unrelated to this branch.	2016-06-13 23:42:21 -04:00
speedplane	806ee603ff	More fixes to layout. The compute neighbors function for horizontal lines is only intended to find neighbors on differing lines. However, it's entirely possible that horizontal neighbors could appear. This commit finds horizontal neighbors in a horizonal line and merges them together into a single horizontal line if necessary. This leads to much better text extraction if the PDF was created in a funky way. For example (test case coming), I have seen PDFs which are written almost like vertical columns, but the text is entirely horizontal.	2014-12-12 00:36:59 -05:00
speedplane	45170e7183	There are a number of relatively complex changes here. Comments are in order of where the change appears. 1. When detecting text in a horizontal line, we already add a space between words if separated by more than word_margin apart. However now, we only do it if there is not already an existing space. This prevents multiple spaces being placed between words. 2. Detect a horizontal line if the line is zero width. This improves our detection of horizonal lines when looking for both horizontal and vertical. 3. Don't detect a vertical line if the previous letter is whitspace. Prevents double spaces being caught as vert lines. 4. Improve upon an unfortunate O(N^2) algorithm which I have seen taking many minutes to execute. Unfortunately, while the "fix" reduces algorithmic complexity, it isn't technically correct, so we only do it when we know things will take a long time.	2014-12-12 00:36:59 -05:00
unknown	29c07ea770	Python 3.4 support and tests	2014-09-03 15:26:08 +02:00
Yusuke Shinyama	8791355e1d	Cleanup imports. Use relative imports.	2014-06-26 18:12:39 +09:00
Yusuke Shinyama	2e900e5d10	Fixed for consistent test results. (hopefully...)	2014-06-26 17:41:31 +09:00
Yusuke Shinyama	0387a6c260	Removed: tuple-unpacking args.	2014-06-15 12:12:13 +09:00
Yusuke Shinyama	a8ec99a848	More autotest tweaks.	2014-06-15 10:52:59 +09:00
Yusuke Shinyama	1384a3fe8d	Code cleanup: removed some debug flags.	2014-06-14 15:43:10 +09:00
Yusuke Shinyama	8e8e22c095	Fixed a layout bug introduced at `c97ec304`.	2014-06-13 23:05:04 +09:00
Yusuke Shinyama	340387bfc6	Cleanup: isinstance	2014-03-28 17:50:59 +09:00
Yusuke Shinyama	c97ec3048e	Changed / to // for clarity.	2013-11-26 21:35:16 +09:00
Yusuke Shinyama	acad011e3f	Code cleanup.	2013-11-11 20:46:30 +09:00
Yusuke Shinyama	cbef967fbf	Renamed: LTAnon -> LTAnno	2013-11-11 19:17:45 +09:00
Yusuke Shinyama	c8b6d4112a	Fixed: crash with negative layout bbox.	2013-11-09 15:10:14 +09:00
Yusuke Shinyama	2b56b2eedf	Merged.	2013-11-07 19:50:41 +09:00
Matthew Duggan	2caa5edc25	PEP8: Whitespace changes to match pep8	2013-11-07 17:35:04 +09:00
Matthew Duggan	c1da8b835c	PEP8: Remove trailing whitespace	2013-11-07 16:14:53 +09:00
Matthew Duggan	10a68c83bd	Remove unused imports identified by pyflakes	2013-11-07 16:09:44 +09:00
Yusuke Shinyama	4ef81ae9d8	Improved word spacing.	2013-11-05 18:25:19 +09:00
Yusuke Shinyama	e927bd307e	fixed: https://github.com/euske/pdfminer/issues/8	2013-10-22 18:24:39 +09:00
Yusuke Shinyama	0ea08890d4	renamed: python2 -> python.	2013-10-17 23:05:27 +09:00
Yusuke Shinyama	eabe72ee63	Prevent crash with empty layout box.	2013-10-09 22:13:22 +09:00
jcushman	f77f196cd3	2x faster group_textboxes function.	2012-06-22 18:11:45 -03:00
Yusuke Shinyama	f638784e1d	experimental layout analysis improvements	2011-08-14 09:44:21 +09:00
Yusuke Shinyama	c134596e2f	code cleanup and testcase stabilization	2011-05-15 01:22:19 +09:00
Yusuke Shinyama	e5d02f8653	fixed the infinite recursion bug.	2011-05-14 16:32:09 +09:00
Yusuke Shinyama	0c41b8348e	code cleanup	2011-05-14 15:51:40 +09:00
Yusuke Shinyama	038ce4cd0c	added LTText.get_text() and .text property is no longer accessible.	2011-05-14 15:45:08 +09:00
Yusuke Shinyama	5004e4b28d	layout analysis speedup.	2011-05-14 14:17:39 +09:00
Yusuke Shinyama	8f9684f6a6	code cleanup: layout analysis	2011-04-21 22:07:04 +09:00
Yusuke Shinyama	0e660dd385	rename: LTPolygon -> LTCurve	2011-04-20 22:05:25 +09:00
Yusuke Shinyama	bb26cf9180	eliminate empty textboxes	2011-03-01 20:47:20 +09:00
Yusuke Shinyama	a8bf9b159e	docstring fix	2011-02-27 13:09:12 +09:00
Yusuke Shinyama	cabaa10e4f	layout analysis improvement	2011-02-27 12:56:28 +09:00
Yusuke Shinyama	f00f1dbd04	better layout analysis	2011-02-14 23:41:23 +09:00
Yusuke Shinyama	cd412308bd	text flow detection bug fix (thanks to fujimoto-san)	2011-02-14 22:32:55 +09:00

1 2 3

101 Commits (78f06225b6e42aa5878e51521c8f2a7b97efa7ea)