speedplane
45170e7183
There are a number of relatively complex changes here. Comments are in order of where the change appears.
...
1.
When detecting text in a horizontal line, we already add a space between words if separated by more than word_margin apart. However now, we only do it if there is not already an existing space. This prevents multiple spaces being placed between words.
2.
Detect a horizontal line if the line is zero width. This improves our detection of horizonal lines when looking for both horizontal and vertical.
3.
Don't detect a vertical line if the previous letter is whitspace. Prevents double spaces being caught as vert lines.
4.
Improve upon an unfortunate O(N^2) algorithm which I have seen taking many minutes to execute. Unfortunately, while the "fix" reduces algorithmic complexity, it isn't technically correct, so we only do it when we know things will take a long time.
2014-12-12 00:36:59 -05:00
speedplane
c32550dd4a
Merge branch 'fix-makefile'
2014-12-11 00:54:14 -05:00
speedplane
5cbdd915c7
Remove the dependancy on python2. Also, allow tests to be run on cygwin by checking for it, and converting unix2dos line endings.
2014-12-11 00:53:33 -05:00
speedplane
830b2403e2
Merge branch 'euske-main/master'
2014-12-11 00:06:46 -05:00
Yusuke Shinyama
0112112458
Fixed: crash on invalid chr number.
2014-12-09 22:55:47 +09:00
Yusuke Shinyama
75206ba18d
Removed: .gitignore
2014-12-09 22:49:13 +09:00
Yusuke Shinyama
4b585221e2
Merge pull request #76 from speedplane/master
...
Fix Unicode Bug + Add GitIgnore + Add Debug Flags
2014-12-09 22:22:33 +09:00
Philippe Guglielmetti
448aa08bc4
Merge pull request #4 from enkore/master
...
Fix utils.decode_text
2014-12-05 09:58:58 +01:00
enkore
d0379a2c44
Fix utils.decode_text
2014-12-04 17:09:52 +01:00
speedplane
36977fbe08
Add debug flags for much of the debug output.
2014-11-11 23:36:58 -05:00
speedplane
1067cb9f9f
Use a .gitignore file.
2014-11-11 23:36:26 -05:00
speedplane
ecc4d05675
Fix a unicode conversion bug.
...
See https://github.com/euske/pdfminer/issues/75
2014-11-11 23:34:33 -05:00
Philippe Guglielmetti
0e40264071
Merge pull request #3 from Cybjit/master
...
Samples and latin1 passwords
2014-09-17 07:22:52 +02:00
cybjit
515687e1bb
more xrange to range
2014-09-16 23:17:31 +02:00
cybjit
2639b15ef4
guess argv encoding in py2 using sys.stdin.encoding
2014-09-16 23:17:26 +02:00
cybjit
9b2e29396b
apply_png_predictor py3
2014-09-16 22:59:29 +02:00
cybjit
ad05121c69
password py3
2014-09-16 22:59:00 +02:00
cybjit
14585987c3
keep password api unicode, latin1 or utf-8 is encoded in handler
2014-09-16 22:58:25 +02:00
cybjit
2260f77b19
fix dict_value usage in strict mode
2014-09-16 22:57:29 +02:00
cybjit
51a361c145
clean up HTMLConverter and XMLConverter encoding
2014-09-16 22:57:00 +02:00
cybjit
2ee7153f6e
add python3 in sample Makefile
2014-09-16 22:56:13 +02:00
Goulu
f577f76c52
renamed as pdfminer.six in PyPi
2014-09-15 11:10:00 +02:00
Goulu
03de0f4db8
forgot 'six' requirement ...
2014-09-15 10:42:08 +02:00
Goulu
8861d7e0ed
version 20140915 pushed to PyPi as pdfminer_six
2014-09-15 10:33:04 +02:00
Philippe Guglielmetti
4f8aa9ff5b
Merge pull request #2 from Cybjit/master
...
CMap fixes and speed improvements
2014-09-12 07:33:06 +02:00
cybjit
714423883c
setup logging for pdf2txt and fix dumppdf
2014-09-12 00:29:31 +02:00
cybjit
39942b6642
avoid string formating when not logging
2014-09-12 00:29:31 +02:00
cybjit
01821c7d1e
rename bytes to avoid built-in collision
2014-09-12 00:29:31 +02:00
cybjit
31e6afc7cf
faster and simpler bytes implementation
2014-09-12 00:29:30 +02:00
cybjit
ed13f7c47d
conv_cmap py3 compat
2014-09-12 00:29:30 +02:00
cybjit
cba5a42ba8
decipher_all bytes
2014-09-12 00:29:30 +02:00
cybjit
6357e2da80
code2cid uses int, not byte
2014-09-12 00:29:27 +02:00
cybjit
9b0a3ee53e
decode cmap font name
2014-09-11 23:30:02 +02:00
Philippe Guglielmetti
7b620b3146
Merge pull request #1 from Cybjit/master
...
Python 3 text conversion issues
2014-09-09 20:42:37 +02:00
cybjit
a6f31a713d
cmap bytes and decode
2014-09-07 18:41:04 +02:00
cybjit
cc733c8217
fixes for ARC4
2014-09-07 18:38:22 +02:00
cybjit
f9a67db89b
change xrange to range
2014-09-07 18:36:12 +02:00
cybjit
0a2d90c051
pdf2txt: do not double encode stdout
2014-09-07 18:34:11 +02:00
unknown
28c2a4e6ad
2.7/3.4 encoding corrected
2014-09-04 10:31:33 +02:00
unknown
58b8492783
no logging in travis.ci
2014-09-04 10:19:50 +02:00
unknown
1c93468c7e
faster, less verbose tests
2014-09-04 10:02:29 +02:00
unknown
7b610b34be
tools must be a module to enable scripts tests
2014-09-04 09:47:33 +02:00
unknown
4ab48d1803
Python 3.4 compatibility + tests
2014-09-04 09:36:19 +02:00
unknown
29c07ea770
Python 3.4 support and tests
2014-09-03 15:26:08 +02:00
unknown
a6475b61b4
Python 3.4 support added and tested
2014-09-03 13:17:41 +02:00
unknown
846cd18186
Python 3.4 support
2014-09-02 15:49:46 +02:00
unknown
faea7291a8
tests pass under Py 2.7 and 3.4
2014-09-01 14:16:49 +02:00
Yusuke Shinyama
b0e035c24f
Style fix: always have an explicit return.
2014-07-15 21:38:29 +09:00
Yusuke Shinyama
f5b5e31921
Fixed: DecodeParms array support.
2014-07-09 19:07:27 +09:00
Yusuke Shinyama
137fc3a1ae
Use KWD instead of token.name.
2014-06-30 19:15:21 +09:00