Venelin Stoykov
c2432c32f1
Fix assert message for PDFLayoutAnalyzer.end_page ( #80 )
...
stack is undefined
2017-08-18 08:08:08 +02:00
Hugh Secker-Walker
488545ddc7
Add string expressions to asserts showing local data ( #67 )
2017-05-29 09:06:09 +02:00
Philippe Guglielmetti
52feb22eeb
Merge remote-tracking branch 'origin/master'
...
Conflicts:
MANIFEST.in
README.md
pdfminer/latin_enc.py
pdfminer/pdfdocument.py
pdfminer/pdfinterp.py
pdfminer/pdfpage.py
pdfminer/pdftypes.py
pdfminer/psparser.py
pdfminer/utils.py
samples/Makefile
setup.py
2017-01-19 08:03:16 +01:00
Humberto Pereira
e6ad15af79
Added painting information ( #37 )
...
* added color support to stroking and non stroking color spaces
* extended LTCurve, LTLine and LTRect to save painting information
* modified PDFLayoutAnalyzer to populate the shapes with painting information
2016-11-08 20:01:58 +01:00
Antonio Ercole De Luca
0fdebc6739
Removing all the "#!/usr/bin/env python" lines, they do not need for … ( #34 )
...
* Removing all the "#!/usr/bin/env python" lines, they do not need for python3, solving issue number: #19 .
* Restored all the shebangs in the tools and tests folders (because they are real executables) but used "#!/usr/bin/env python" instead of "#!/usr/bin/python" as this blog points out: https://www.peterbe.com/plog/importance-of-env
Removed also the shebang from pdfminer/psparser.py file.
2016-11-08 20:01:11 +01:00
Jakub Wilk
5ddbecb551
Fix typos
2016-09-13 16:25:09 +02:00
Friedrich Lindenberg
1d54ecd31c
Make the logger run in a namespace.
2016-05-20 21:12:05 +02:00
Cathal Garvey
a2ad7a6d03
Fixed some bugs preventing all tests from passing in Py2.
2015-05-30 18:02:29 +01:00
Cathal Garvey
08cb217983
Progress, progress.. not nearly atomic enough, sorry.
2015-05-30 16:14:24 +01:00
Cathal Garvey
1b47bed306
Many changes to make pdf2txt.py work better in Py3, some in that script, others in module!
...
Sorry, changes should have been more atomic.
*In pdf2txt.py:*
* Re-wrote main function to use argparse instead of optparse.
* Manually tested in Py2/Py3 to get partial consistency.
* Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway.
* Py2 mode *probably* unchanged, cannot find any bugs yet...
* Kept old main function for posterity, for now.
*In utils:*
* Added a few compatibility functions (some string hax required chardet, new dependency):
- make_compat_bytes(in_str)-> (py3->bytes | py2->str)
- make_compat_str(in_str)-> (str)
- compatible_encode_method(bytesorstring, encoding, erraction)-> (str)
*In pdfdevice:*
* To handle different output filetypes in Py3, injected lots of calls to new utils methods,
as well as some six.PYX checks and logic. These changes are largely responsible for
enhanced Py2/Py3 consistency.
*In converter:*
* To handle output filetypes in Py2, injected a few checks and fixes particularly around the
py2 `str.encode` method and its *assumed* usual use-analogies in Py3.
2015-05-17 21:08:57 +01:00
Yusuke Shinyama
14fd0fd2d6
Fixed : #84 (fontname was in unicode)
2015-04-05 19:02:02 +09:00
cybjit
51a361c145
clean up HTMLConverter and XMLConverter encoding
2014-09-16 22:57:00 +02:00
cybjit
39942b6642
avoid string formating when not logging
2014-09-12 00:29:31 +02:00
cybjit
f9a67db89b
change xrange to range
2014-09-07 18:36:12 +02:00
cybjit
0a2d90c051
pdf2txt: do not double encode stdout
2014-09-07 18:34:11 +02:00
unknown
29c07ea770
Python 3.4 support and tests
2014-09-03 15:26:08 +02:00
Yusuke Shinyama
8791355e1d
Cleanup imports. Use relative imports.
2014-06-26 18:12:39 +09:00
Yusuke Shinyama
44074b42ea
Added: stripcontrol for XMLConverter (-S option)
2014-06-22 00:33:00 +09:00
Yusuke Shinyama
1384a3fe8d
Code cleanup: removed some debug flags.
2014-06-14 15:43:10 +09:00
Yusuke Shinyama
8e14ebf4e1
Use logging module instead of print.
2014-06-14 12:00:49 +09:00
Yusuke Shinyama
2b56b2eedf
Merged.
2013-11-07 19:50:41 +09:00
Matthew Duggan
2caa5edc25
PEP8: Whitespace changes to match pep8
2013-11-07 17:35:04 +09:00
Matthew Duggan
c1da8b835c
PEP8: Remove trailing whitespace
2013-11-07 16:14:53 +09:00
Matthew Duggan
10a68c83bd
Remove unused imports identified by pyflakes
2013-11-07 16:09:44 +09:00
Yusuke Shinyama
02ad086f6a
fixed: HTMLConverter.
2013-10-25 18:10:40 +09:00
Yusuke Shinyama
0ea08890d4
renamed: python2 -> python.
2013-10-17 23:05:27 +09:00
Yusuke Shinyama
82ff98c7b3
imagewriter now works with text output
2011-11-07 01:15:10 +10:00
Yusuke Shinyama
dc8fde0e47
added CCITTFaxFilter support and a very crude image extraction.
2011-07-18 21:07:00 +10:00
Yusuke Shinyama
170c97a12b
colorspace patch by Lieb Simon
2011-06-06 17:10:12 +09:00
Yusuke Shinyama
0c41b8348e
code cleanup
2011-05-14 15:51:40 +09:00
Yusuke Shinyama
038ce4cd0c
added LTText.get_text() and .text property is no longer accessible.
2011-05-14 15:45:08 +09:00
Yusuke Shinyama
095534b294
figure object now does not call analyze.
2011-05-14 14:17:22 +09:00
Yusuke Shinyama
0e660dd385
rename: LTPolygon -> LTCurve
2011-04-20 22:05:25 +09:00
Yusuke Shinyama
dab70855bf
LTLine is now strictly horizontal or vertical.
2011-04-20 22:01:54 +09:00
Jonathan J Hunt
ec682539da
Optimized memory usage in TextConverter by ignoring all drawing commands.
2011-03-07 15:11:31 +10:00
Yusuke Shinyama
7dbb664db3
code cleanup and more debugging options
2011-02-14 23:42:05 +09:00
Yusuke Shinyama
b2d13db29a
code cleanup
2011-02-14 22:51:20 +09:00
Yusuke Shinyama
4eb6083c09
code cleanup
2011-01-03 18:11:22 +09:00
Yusuke Shinyama
3da3adad9b
method renamed: finish(self) -> analyze(self, laparams).
2010-12-26 16:56:21 +09:00
yusuke.shinyama.dummy
84ed94aec0
another bugfix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@281 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:41:03 +00:00
yusuke.shinyama.dummy
9bba7ac08b
oops, forgot to fix this
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@280 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:40:58 +00:00
yusuke.shinyama.dummy
9f78915ea6
show cid for unknown characters
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@275 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:19 +00:00
yusuke.shinyama.dummy
7374b81383
htmlconverter improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@274 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 15:04:28 +00:00
yusuke.shinyama.dummy
fb4ce96309
add font-family
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@273 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:50 +00:00
yusuke.shinyama.dummy
476ecf7e32
add html exect layout mode; default changed.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@272 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:41 +00:00
yusuke.shinyama.dummy
9584845358
layout analysis improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@268 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:05 +00:00
yusuke.shinyama.dummy
edbd3764a7
html layout output fix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@267 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:48 +00:00
yusuke.shinyama.dummy
509ab66319
stay with python2
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy
cc139db8a7
bugfix LTChar.is_vertical undefined. verticality is now handled by LTTextBox
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@254 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:23 +00:00
yusuke.shinyama.dummy
3305c07ba2
layout analysis improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@245 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:39 +00:00