speedplane
45170e7183
There are a number of relatively complex changes here. Comments are in order of where the change appears.
...
1.
When detecting text in a horizontal line, we already add a space between words if separated by more than word_margin apart. However now, we only do it if there is not already an existing space. This prevents multiple spaces being placed between words.
2.
Detect a horizontal line if the line is zero width. This improves our detection of horizonal lines when looking for both horizontal and vertical.
3.
Don't detect a vertical line if the previous letter is whitspace. Prevents double spaces being caught as vert lines.
4.
Improve upon an unfortunate O(N^2) algorithm which I have seen taking many minutes to execute. Unfortunately, while the "fix" reduces algorithmic complexity, it isn't technically correct, so we only do it when we know things will take a long time.
2014-12-12 00:36:59 -05:00
Yusuke Shinyama
8791355e1d
Cleanup imports. Use relative imports.
2014-06-26 18:12:39 +09:00
Yusuke Shinyama
2e900e5d10
Fixed for consistent test results. (hopefully...)
2014-06-26 17:41:31 +09:00
Yusuke Shinyama
0387a6c260
Removed: tuple-unpacking args.
2014-06-15 12:12:13 +09:00
Yusuke Shinyama
a8ec99a848
More autotest tweaks.
2014-06-15 10:52:59 +09:00
Yusuke Shinyama
1384a3fe8d
Code cleanup: removed some debug flags.
2014-06-14 15:43:10 +09:00
Yusuke Shinyama
8e8e22c095
Fixed a layout bug introduced at c97ec304
.
2014-06-13 23:05:04 +09:00
Yusuke Shinyama
340387bfc6
Cleanup: isinstance
2014-03-28 17:50:59 +09:00
Yusuke Shinyama
c97ec3048e
Changed / to // for clarity.
2013-11-26 21:35:16 +09:00
Yusuke Shinyama
acad011e3f
Code cleanup.
2013-11-11 20:46:30 +09:00
Yusuke Shinyama
cbef967fbf
Renamed: LTAnon -> LTAnno
2013-11-11 19:17:45 +09:00
Yusuke Shinyama
c8b6d4112a
Fixed: crash with negative layout bbox.
2013-11-09 15:10:14 +09:00
Yusuke Shinyama
2b56b2eedf
Merged.
2013-11-07 19:50:41 +09:00
Matthew Duggan
2caa5edc25
PEP8: Whitespace changes to match pep8
2013-11-07 17:35:04 +09:00
Matthew Duggan
c1da8b835c
PEP8: Remove trailing whitespace
2013-11-07 16:14:53 +09:00
Matthew Duggan
10a68c83bd
Remove unused imports identified by pyflakes
2013-11-07 16:09:44 +09:00
Yusuke Shinyama
4ef81ae9d8
Improved word spacing.
2013-11-05 18:25:19 +09:00
Yusuke Shinyama
e927bd307e
fixed: https://github.com/euske/pdfminer/issues/8
2013-10-22 18:24:39 +09:00
Yusuke Shinyama
0ea08890d4
renamed: python2 -> python.
2013-10-17 23:05:27 +09:00
Yusuke Shinyama
eabe72ee63
Prevent crash with empty layout box.
2013-10-09 22:13:22 +09:00
jcushman
f77f196cd3
2x faster group_textboxes function.
2012-06-22 18:11:45 -03:00
Yusuke Shinyama
f638784e1d
experimental layout analysis improvements
2011-08-14 09:44:21 +09:00
Yusuke Shinyama
c134596e2f
code cleanup and testcase stabilization
2011-05-15 01:22:19 +09:00
Yusuke Shinyama
e5d02f8653
fixed the infinite recursion bug.
2011-05-14 16:32:09 +09:00
Yusuke Shinyama
0c41b8348e
code cleanup
2011-05-14 15:51:40 +09:00
Yusuke Shinyama
038ce4cd0c
added LTText.get_text() and .text property is no longer accessible.
2011-05-14 15:45:08 +09:00
Yusuke Shinyama
5004e4b28d
layout analysis speedup.
2011-05-14 14:17:39 +09:00
Yusuke Shinyama
8f9684f6a6
code cleanup: layout analysis
2011-04-21 22:07:04 +09:00
Yusuke Shinyama
0e660dd385
rename: LTPolygon -> LTCurve
2011-04-20 22:05:25 +09:00
Yusuke Shinyama
bb26cf9180
eliminate empty textboxes
2011-03-01 20:47:20 +09:00
Yusuke Shinyama
a8bf9b159e
docstring fix
2011-02-27 13:09:12 +09:00
Yusuke Shinyama
cabaa10e4f
layout analysis improvement
2011-02-27 12:56:28 +09:00
Yusuke Shinyama
f00f1dbd04
better layout analysis
2011-02-14 23:41:23 +09:00
Yusuke Shinyama
cd412308bd
text flow detection bug fix (thanks to fujimoto-san)
2011-02-14 22:32:55 +09:00
Yusuke Shinyama
cbd58121e3
fix aggressive vertical writing detection (which ruins layout)
2011-02-02 23:09:34 +09:00
Yusuke Shinyama
a24c452ba2
boxes_flow patch by Daniel Gerber
2010-12-26 17:26:39 +09:00
Yusuke Shinyama
3da3adad9b
method renamed: finish(self) -> analyze(self, laparams).
2010-12-26 16:56:21 +09:00
yusuke.shinyama.dummy
476ecf7e32
add html exect layout mode; default changed.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@272 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:41 +00:00
yusuke.shinyama.dummy
0d1f00fa9b
improved layout analysis for vertical script
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@269 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:14 +00:00
yusuke.shinyama.dummy
9584845358
layout analysis improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@268 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:05 +00:00
yusuke.shinyama.dummy
edbd3764a7
html layout output fix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@267 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:48 +00:00
yusuke.shinyama.dummy
509ab66319
stay with python2
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy
438b4953be
documentation bit and code cleanup
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@263 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:49 +00:00
yusuke.shinyama.dummy
3305c07ba2
layout analysis improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@245 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:39 +00:00
yusuke.shinyama.dummy
bc1303e901
layout analysis improvement 1
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@244 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:33 +00:00
yusuke.shinyama.dummy
0944cfaded
test file simple3.pdf added.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@240 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:41 +00:00
yusuke.shinyama.dummy
83d2086f19
fix minor layout issue
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@239 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:31 +00:00
yusuke.shinyama.dummy
4554705881
glyphlist bug (due to my misunderstanding of spec.)
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@237 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-26 15:02:46 +00:00
yusuke.shinyama.dummy
ac74542d1f
minor bugfixes
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@234 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-26 15:02:29 +00:00
yusuke.shinyama.dummy
f5aff374fc
some wordings and documentations
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@229 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 03:56:50 +00:00