Friedrich Lindenberg
1d54ecd31c
Make the logger run in a namespace.
2016-05-20 21:12:05 +02:00
Cathal Garvey
a2ad7a6d03
Fixed some bugs preventing all tests from passing in Py2.
2015-05-30 18:02:29 +01:00
Cathal Garvey
08cb217983
Progress, progress.. not nearly atomic enough, sorry.
2015-05-30 16:14:24 +01:00
Cathal Garvey
1b47bed306
Many changes to make pdf2txt.py work better in Py3, some in that script, others in module!
...
Sorry, changes should have been more atomic.
*In pdf2txt.py:*
* Re-wrote main function to use argparse instead of optparse.
* Manually tested in Py2/Py3 to get partial consistency.
* Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway.
* Py2 mode *probably* unchanged, cannot find any bugs yet...
* Kept old main function for posterity, for now.
*In utils:*
* Added a few compatibility functions (some string hax required chardet, new dependency):
- make_compat_bytes(in_str)-> (py3->bytes | py2->str)
- make_compat_str(in_str)-> (str)
- compatible_encode_method(bytesorstring, encoding, erraction)-> (str)
*In pdfdevice:*
* To handle different output filetypes in Py3, injected lots of calls to new utils methods,
as well as some six.PYX checks and logic. These changes are largely responsible for
enhanced Py2/Py3 consistency.
*In converter:*
* To handle output filetypes in Py2, injected a few checks and fixes particularly around the
py2 `str.encode` method and its *assumed* usual use-analogies in Py3.
2015-05-17 21:08:57 +01:00
cybjit
51a361c145
clean up HTMLConverter and XMLConverter encoding
2014-09-16 22:57:00 +02:00
cybjit
39942b6642
avoid string formating when not logging
2014-09-12 00:29:31 +02:00
cybjit
f9a67db89b
change xrange to range
2014-09-07 18:36:12 +02:00
cybjit
0a2d90c051
pdf2txt: do not double encode stdout
2014-09-07 18:34:11 +02:00
unknown
29c07ea770
Python 3.4 support and tests
2014-09-03 15:26:08 +02:00
Yusuke Shinyama
8791355e1d
Cleanup imports. Use relative imports.
2014-06-26 18:12:39 +09:00
Yusuke Shinyama
44074b42ea
Added: stripcontrol for XMLConverter (-S option)
2014-06-22 00:33:00 +09:00
Yusuke Shinyama
1384a3fe8d
Code cleanup: removed some debug flags.
2014-06-14 15:43:10 +09:00
Yusuke Shinyama
8e14ebf4e1
Use logging module instead of print.
2014-06-14 12:00:49 +09:00
Yusuke Shinyama
2b56b2eedf
Merged.
2013-11-07 19:50:41 +09:00
Matthew Duggan
2caa5edc25
PEP8: Whitespace changes to match pep8
2013-11-07 17:35:04 +09:00
Matthew Duggan
c1da8b835c
PEP8: Remove trailing whitespace
2013-11-07 16:14:53 +09:00
Matthew Duggan
10a68c83bd
Remove unused imports identified by pyflakes
2013-11-07 16:09:44 +09:00
Yusuke Shinyama
02ad086f6a
fixed: HTMLConverter.
2013-10-25 18:10:40 +09:00
Yusuke Shinyama
0ea08890d4
renamed: python2 -> python.
2013-10-17 23:05:27 +09:00
Yusuke Shinyama
82ff98c7b3
imagewriter now works with text output
2011-11-07 01:15:10 +10:00
Yusuke Shinyama
dc8fde0e47
added CCITTFaxFilter support and a very crude image extraction.
2011-07-18 21:07:00 +10:00
Yusuke Shinyama
170c97a12b
colorspace patch by Lieb Simon
2011-06-06 17:10:12 +09:00
Yusuke Shinyama
0c41b8348e
code cleanup
2011-05-14 15:51:40 +09:00
Yusuke Shinyama
038ce4cd0c
added LTText.get_text() and .text property is no longer accessible.
2011-05-14 15:45:08 +09:00
Yusuke Shinyama
095534b294
figure object now does not call analyze.
2011-05-14 14:17:22 +09:00
Yusuke Shinyama
0e660dd385
rename: LTPolygon -> LTCurve
2011-04-20 22:05:25 +09:00
Yusuke Shinyama
dab70855bf
LTLine is now strictly horizontal or vertical.
2011-04-20 22:01:54 +09:00
Jonathan J Hunt
ec682539da
Optimized memory usage in TextConverter by ignoring all drawing commands.
2011-03-07 15:11:31 +10:00
Yusuke Shinyama
7dbb664db3
code cleanup and more debugging options
2011-02-14 23:42:05 +09:00
Yusuke Shinyama
b2d13db29a
code cleanup
2011-02-14 22:51:20 +09:00
Yusuke Shinyama
4eb6083c09
code cleanup
2011-01-03 18:11:22 +09:00
Yusuke Shinyama
3da3adad9b
method renamed: finish(self) -> analyze(self, laparams).
2010-12-26 16:56:21 +09:00
yusuke.shinyama.dummy
84ed94aec0
another bugfix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@281 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:41:03 +00:00
yusuke.shinyama.dummy
9bba7ac08b
oops, forgot to fix this
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@280 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:40:58 +00:00
yusuke.shinyama.dummy
9f78915ea6
show cid for unknown characters
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@275 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:19 +00:00
yusuke.shinyama.dummy
7374b81383
htmlconverter improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@274 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 15:04:28 +00:00
yusuke.shinyama.dummy
fb4ce96309
add font-family
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@273 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:50 +00:00
yusuke.shinyama.dummy
476ecf7e32
add html exect layout mode; default changed.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@272 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:41 +00:00
yusuke.shinyama.dummy
9584845358
layout analysis improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@268 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:05 +00:00
yusuke.shinyama.dummy
edbd3764a7
html layout output fix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@267 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:48 +00:00
yusuke.shinyama.dummy
509ab66319
stay with python2
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy
cc139db8a7
bugfix LTChar.is_vertical undefined. verticality is now handled by LTTextBox
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@254 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:23 +00:00
yusuke.shinyama.dummy
3305c07ba2
layout analysis improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@245 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:39 +00:00
yusuke.shinyama.dummy
cf52476f5e
remove redundancy
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@221 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-06 05:16:21 +00:00
yusuke.shinyama.dummy
fe3bdbfce0
text rise support added
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@217 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-05-18 14:57:04 +00:00
yusuke.shinyama.dummy
eb535d4106
change PDFPageAggregator -> PDFLayoutAnalyzer
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@213 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 13:31:21 +00:00
yusuke.shinyama.dummy
833f859449
move TagExtractor
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@212 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 13:31:11 +00:00
yusuke.shinyama.dummy
97848409e5
fix xobject resources bug, thanks to Jose Maria
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@209 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 04:32:03 +00:00
yusuke.shinyama.dummy
e77a6ba997
-A (all_texts) option added for layout analysis
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@205 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:30:03 +00:00
yusuke.shinyama.dummy
609c6e1f5f
rename: LayoutItem -> LTItem, LayoutContainer -> LTContainer
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@203 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:29:30 +00:00