Goulu
5a23fad6fd
Merge pull request #14 from orangain/close-device
...
Close device to write footer of xml/html files
2016-01-18 11:22:35 +01:00
Goulu
2103e5875e
Merge pull request #13 from orangain/include-cmap
...
Include compiled cmap resources to simplify installation for CJK languages
2016-01-18 11:22:08 +01:00
Steve Hair
92c71436b9
Improved settings management
2016-01-10 12:17:38 -05:00
orangain
f8a051adbd
Close device to write footer of xml/html files
2015-12-27 20:57:00 +09:00
orangain
f1d5d681b6
Include compiled cmap resources to simplify installation for CJK languages
...
* Run `make cmap` and `git add pdfminer/cmap`.
* Modify MANIFEST.in not to include cmaprsrc dir in the sdist package.
* Add pdfminer/cmap/README.txt to include license in the sdist package.
* Remove installation guide specific to CJK languages from README.md.
2015-12-27 13:32:29 +09:00
lucanaso
63bb3caec2
Fixed for rendering non breaking spaces (cid:160)
...
As stated in the PDF specification ISO 32000-1, table in Annex D.2 Latin Character Set and Encodings page 653 to 656 (available here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf ):
"The SPACE character shall also be encoded as 312 in MacRomanEncoding and as 240 in WinAnsiEncoding. This duplicate code shall signify a nonbreaking space; it shall be typographically the same as (U+003A) SPACE."
The duplicate key was missing, therefore PDFMiner was returning the string "(cid:160)".
This fix adds the duplicate key in latin_enc.py
glyphlist.py does not need to be modified as it already contains a key for non breaking space https://github.com/lucanaso/pdfminer/blob/master/pdfminer/glyphlist.py#L2755 .
2015-12-09 16:47:32 +01:00
Chris Hager
8149be1669
bugfixes
2015-12-06 00:17:58 +01:00
Chris Hager
2e1be5721f
removed settings.ENFORCE_CHECK_EXTRACTABLE
2015-11-01 22:34:18 +01:00
Chris Hager
b686dd0139
pdfminer/settings.py for STRICT and added ENFORCE_CHECK_EXTRACTABLE
2015-11-01 22:28:08 +01:00
Ivan Pozdeev
63c9378b8b
make ValueError's descriptive
2015-08-10 03:14:51 +03:00
Alex Zagorodniuk
131cb1ea92
change STRICT to be a settings attribute
2015-06-22 19:08:35 -04:00
Goulu
623bd98452
Update __init__.py
...
version 20150601
2015-06-01 10:21:51 +02:00
Cathal Garvey
403711ed13
Whoops, forgot to version-gate chardet in the actual code. Thanks Travis!
2015-05-30 19:33:35 +01:00
Cathal Garvey
a2ad7a6d03
Fixed some bugs preventing all tests from passing in Py2.
2015-05-30 18:02:29 +01:00
Cathal Garvey
79c97ac221
Docstrings.
2015-05-30 17:16:06 +01:00
Cathal Garvey
3b7edba48c
Forgot to add the actual compartmentalised function..
2015-05-30 17:04:28 +01:00
Cathal Garvey
08cb217983
Progress, progress.. not nearly atomic enough, sorry.
2015-05-30 16:14:24 +01:00
Cathal Garvey
1b47bed306
Many changes to make pdf2txt.py work better in Py3, some in that script, others in module!
...
Sorry, changes should have been more atomic.
*In pdf2txt.py:*
* Re-wrote main function to use argparse instead of optparse.
* Manually tested in Py2/Py3 to get partial consistency.
* Errors abound including Tags mode, but most modes weren't working at all in Py3 anyway.
* Py2 mode *probably* unchanged, cannot find any bugs yet...
* Kept old main function for posterity, for now.
*In utils:*
* Added a few compatibility functions (some string hax required chardet, new dependency):
- make_compat_bytes(in_str)-> (py3->bytes | py2->str)
- make_compat_str(in_str)-> (str)
- compatible_encode_method(bytesorstring, encoding, erraction)-> (str)
*In pdfdevice:*
* To handle different output filetypes in Py3, injected lots of calls to new utils methods,
as well as some six.PYX checks and logic. These changes are largely responsible for
enhanced Py2/Py3 consistency.
*In converter:*
* To handle output filetypes in Py2, injected a few checks and fixes particularly around the
py2 `str.encode` method and its *assumed* usual use-analogies in Py3.
2015-05-17 21:08:57 +01:00
Yusuke Shinyama
14fd0fd2d6
Fixed : #84 (fontname was in unicode)
2015-04-05 19:02:02 +09:00
speedplane
806ee603ff
More fixes to layout. The compute neighbors function for horizontal lines is only intended to find neighbors on differing lines. However, it's entirely possible that horizontal neighbors could appear.
...
This commit finds horizontal neighbors in a horizonal line and merges them together into a single horizontal line if necessary. This leads to much better text extraction if the PDF was created in a funky way.
For example (test case coming), I have seen PDFs which are written almost like vertical columns, but the text is entirely horizontal.
2014-12-12 00:36:59 -05:00
speedplane
45170e7183
There are a number of relatively complex changes here. Comments are in order of where the change appears.
...
1.
When detecting text in a horizontal line, we already add a space between words if separated by more than word_margin apart. However now, we only do it if there is not already an existing space. This prevents multiple spaces being placed between words.
2.
Detect a horizontal line if the line is zero width. This improves our detection of horizonal lines when looking for both horizontal and vertical.
3.
Don't detect a vertical line if the previous letter is whitspace. Prevents double spaces being caught as vert lines.
4.
Improve upon an unfortunate O(N^2) algorithm which I have seen taking many minutes to execute. Unfortunately, while the "fix" reduces algorithmic complexity, it isn't technically correct, so we only do it when we know things will take a long time.
2014-12-12 00:36:59 -05:00
Yusuke Shinyama
0112112458
Fixed: crash on invalid chr number.
2014-12-09 22:55:47 +09:00
enkore
d0379a2c44
Fix utils.decode_text
2014-12-04 17:09:52 +01:00
speedplane
36977fbe08
Add debug flags for much of the debug output.
2014-11-11 23:36:58 -05:00
speedplane
ecc4d05675
Fix a unicode conversion bug.
...
See https://github.com/euske/pdfminer/issues/75
2014-11-11 23:34:33 -05:00
cybjit
515687e1bb
more xrange to range
2014-09-16 23:17:31 +02:00
cybjit
9b2e29396b
apply_png_predictor py3
2014-09-16 22:59:29 +02:00
cybjit
ad05121c69
password py3
2014-09-16 22:59:00 +02:00
cybjit
14585987c3
keep password api unicode, latin1 or utf-8 is encoded in handler
2014-09-16 22:58:25 +02:00
cybjit
2260f77b19
fix dict_value usage in strict mode
2014-09-16 22:57:29 +02:00
cybjit
51a361c145
clean up HTMLConverter and XMLConverter encoding
2014-09-16 22:57:00 +02:00
Goulu
8861d7e0ed
version 20140915 pushed to PyPi as pdfminer_six
2014-09-15 10:33:04 +02:00
cybjit
39942b6642
avoid string formating when not logging
2014-09-12 00:29:31 +02:00
cybjit
01821c7d1e
rename bytes to avoid built-in collision
2014-09-12 00:29:31 +02:00
cybjit
31e6afc7cf
faster and simpler bytes implementation
2014-09-12 00:29:30 +02:00
cybjit
cba5a42ba8
decipher_all bytes
2014-09-12 00:29:30 +02:00
cybjit
6357e2da80
code2cid uses int, not byte
2014-09-12 00:29:27 +02:00
cybjit
9b0a3ee53e
decode cmap font name
2014-09-11 23:30:02 +02:00
cybjit
a6f31a713d
cmap bytes and decode
2014-09-07 18:41:04 +02:00
cybjit
cc733c8217
fixes for ARC4
2014-09-07 18:38:22 +02:00
cybjit
f9a67db89b
change xrange to range
2014-09-07 18:36:12 +02:00
cybjit
0a2d90c051
pdf2txt: do not double encode stdout
2014-09-07 18:34:11 +02:00
unknown
58b8492783
no logging in travis.ci
2014-09-04 10:19:50 +02:00
unknown
1c93468c7e
faster, less verbose tests
2014-09-04 10:02:29 +02:00
unknown
4ab48d1803
Python 3.4 compatibility + tests
2014-09-04 09:36:19 +02:00
unknown
29c07ea770
Python 3.4 support and tests
2014-09-03 15:26:08 +02:00
unknown
a6475b61b4
Python 3.4 support added and tested
2014-09-03 13:17:41 +02:00
unknown
846cd18186
Python 3.4 support
2014-09-02 15:49:46 +02:00
unknown
faea7291a8
tests pass under Py 2.7 and 3.4
2014-09-01 14:16:49 +02:00
Yusuke Shinyama
b0e035c24f
Style fix: always have an explicit return.
2014-07-15 21:38:29 +09:00
Yusuke Shinyama
f5b5e31921
Fixed: DecodeParms array support.
2014-07-09 19:07:27 +09:00
Yusuke Shinyama
137fc3a1ae
Use KWD instead of token.name.
2014-06-30 19:15:21 +09:00
Yusuke Shinyama
1ccfaff411
String-Bytes distinction (first attempt).
2014-06-30 19:05:56 +09:00
Yusuke Shinyama
8791355e1d
Cleanup imports. Use relative imports.
2014-06-26 18:12:39 +09:00
Yusuke Shinyama
2e900e5d10
Fixed for consistent test results. (hopefully...)
2014-06-26 17:41:31 +09:00
Yusuke Shinyama
fe86b4e64e
Changed: StringIO -> io.BytesIO
2014-06-25 19:55:41 +09:00
Yusuke Shinyama
44074b42ea
Added: stripcontrol for XMLConverter (-S option)
2014-06-22 00:33:00 +09:00
Yusuke Shinyama
81391c09f4
Fixed : #56 (with a derpy fix)
2014-06-18 19:11:45 +09:00
Yusuke Shinyama
bb866ae148
Changed: new except syntax (2.6 or above).
2014-06-16 18:50:07 +09:00
Yusuke Shinyama
28e96ba3d0
Use print as a function.
2014-06-15 12:14:33 +09:00
Yusuke Shinyama
0387a6c260
Removed: tuple-unpacking args.
2014-06-15 12:12:13 +09:00
Yusuke Shinyama
a8ec99a848
More autotest tweaks.
2014-06-15 10:52:59 +09:00
Yusuke Shinyama
1384a3fe8d
Code cleanup: removed some debug flags.
2014-06-14 15:43:10 +09:00
Yusuke Shinyama
d9680fca7e
Plane: preserve the object order so that the test result is always consistent.
2014-06-14 14:44:53 +09:00
Yusuke Shinyama
aed248610c
Fixed: dependency on pygame in a unittest.
2014-06-14 12:05:26 +09:00
Yusuke Shinyama
8e14ebf4e1
Use logging module instead of print.
2014-06-14 12:00:49 +09:00
Yusuke Shinyama
8e8e22c095
Fixed a layout bug introduced at c97ec304
.
2014-06-13 23:05:04 +09:00
numion
a4997d6f10
Implement revision 4 and 5 encryption handler.
2014-05-19 16:27:43 +02:00
Michael R. Hines
ae2547b0f2
Stop throwing exception on LITERALS_DCT_DECODE
...
I have PDF documents with images stream and two filters, don't throw exceptions on the second one (DCT).
2014-05-14 13:25:30 +08:00
Yusuke Shinyama
6b6fc264ff
Code refactoring: CMap and UnicodeMap both inherit CMapBase.
2014-04-16 18:57:16 +09:00
Yusuke Shinyama
b09c37902f
Fixed: issue #48 (thanks to speedplane)
2014-04-09 17:55:50 +09:00
Yusuke Shinyama
7b354c7ab3
Version 20140328
2014-03-28 22:49:18 +09:00
Yusuke Shinyama
340387bfc6
Cleanup: isinstance
2014-03-28 17:50:59 +09:00
Yusuke Shinyama
7849c8724a
Fixed: PDFXRefStream.get_objids returns invalid objids.
2014-03-28 17:29:26 +09:00
Yusuke Shinyama
57adad55d7
Revert the wrong fix.
2014-03-28 17:24:03 +09:00
Yusuke Shinyama
b18e8c549d
Version 20140327
2014-03-28 00:19:52 +09:00
Yusuke Shinyama
ee47a6603a
Fixed: issues #45
2014-03-28 00:18:17 +09:00
Yusuke Shinyama
ab03037444
Version 20140324
2014-03-24 21:03:46 +09:00
Yusuke Shinyama
4b2beba398
Code cleanup.
2014-03-24 20:59:24 +09:00
Yusuke Shinyama
f9079e4c0a
Fixed dumppdf.py issues.
2014-03-24 20:55:00 +09:00
Yusuke Shinyama
607be269ab
Applied a patch by Axel Kaiser.
2014-03-24 20:45:35 +09:00
Yusuke Shinyama
d7c4ff28e9
Applied a patch by Axel Kaiser.
2014-03-24 20:39:30 +09:00
Yusuke Shinyama
636d4caeb3
Fixed the PNG predictor bug. Thanks to Gabor Molnar.
2014-03-24 19:57:05 +09:00
Yusuke Shinyama
c97ec3048e
Changed / to // for clarity.
2013-11-26 21:35:16 +09:00
Yusuke Shinyama
b589da51b7
Fix for malformed PDFs.
2013-11-26 21:27:45 +09:00
Yusuke Shinyama
cf1e3c9973
Version bump!
2013-11-13 14:52:01 +09:00
Yusuke Shinyama
acad011e3f
Code cleanup.
2013-11-11 20:46:30 +09:00
Yusuke Shinyama
cbef967fbf
Renamed: LTAnon -> LTAnno
2013-11-11 19:17:45 +09:00
Yusuke Shinyama
c8b6d4112a
Fixed: crash with negative layout bbox.
2013-11-09 15:10:14 +09:00
Yusuke Shinyama
2b56b2eedf
Merged.
2013-11-07 19:50:41 +09:00
Matthew Duggan
2caa5edc25
PEP8: Whitespace changes to match pep8
2013-11-07 17:35:04 +09:00
Matthew Duggan
c1da8b835c
PEP8: Remove trailing whitespace
2013-11-07 16:14:53 +09:00
Matthew Duggan
024b821056
Make pyflakes happy by defining variable
2013-11-07 16:10:14 +09:00
Matthew Duggan
10a68c83bd
Remove unused imports identified by pyflakes
2013-11-07 16:09:44 +09:00
Yusuke Shinyama
4ef81ae9d8
Improved word spacing.
2013-11-05 18:25:19 +09:00
Yusuke Shinyama
02ad086f6a
fixed: HTMLConverter.
2013-10-25 18:10:40 +09:00
Yusuke Shinyama
87842233b3
Version bump!
2013-10-22 22:19:38 +09:00
Yusuke Shinyama
d3730a29ec
API change: process_pdf -> PDFPage.get_pages
2013-10-22 18:59:16 +09:00
Yusuke Shinyama
e927bd307e
fixed: https://github.com/euske/pdfminer/issues/8
2013-10-22 18:24:39 +09:00
Yusuke Shinyama
2aa757978b
Reverted to Python2.x syntax. Fixed LZW decoding.
2013-10-19 08:19:40 +09:00
Yusuke Shinyama
bfd9e93c12
Merge branch 'master' of https://github.com/JordanReiter/pdfminer into JordanReiter-master
2013-10-19 07:46:45 +09:00
Yusuke Shinyama
8e4c0c88e3
fixed: https://github.com/euske/pdfminer/issues/26
2013-10-17 23:20:08 +09:00
Yusuke Shinyama
0ea08890d4
renamed: python2 -> python.
2013-10-17 23:05:27 +09:00
Yusuke Shinyama
8d42eec94d
in_cmap is on by default.
2013-10-17 21:40:43 +09:00
Yusuke Shinyama
de9f9715e3
Added: Adobe-UCS
2013-10-17 21:35:25 +09:00
Yusuke Shinyama
1455f134c6
Fixed: missing ObjStm due to invalid seek.
2013-10-10 20:10:57 +09:00
Yusuke Shinyama
f85c374cae
Separated PDFPage to pdfpage.py.
2013-10-10 19:54:55 +09:00
Yusuke Shinyama
2df67d85ae
Expand ObjStm in XRefFallback.
2013-10-10 19:40:43 +09:00
Yusuke Shinyama
e4bc4e43b1
Code cleanup.
2013-10-10 19:17:58 +09:00
Yusuke Shinyama
cfd60eafbf
Removed PDFDocument.read_xref().
2013-10-10 18:57:08 +09:00
Yusuke Shinyama
658be970b8
Separated PDFXRefFallback.
2013-10-10 18:44:12 +09:00
Yusuke Shinyama
c926874d20
API Change: the PDFDocument cstr now takes PDFParser. set_parser() is removed.
2013-10-10 18:40:06 +09:00
Yusuke Shinyama
557c2c72e6
Removed ObjIdRange for terseness.
2013-10-10 18:34:43 +09:00
Yusuke Shinyama
2221163b94
Split pdfparser.py and pdfdocument.py.
2013-10-10 18:29:30 +09:00
Yusuke Shinyama
1467fc674c
Added fallback for broken PDFs.
2013-10-09 22:45:54 +09:00
Yusuke Shinyama
eabe72ee63
Prevent crash with empty layout box.
2013-10-09 22:13:22 +09:00
Yusuke Shinyama
87143cb36f
Fallback when /Pages does not exist.
2013-10-09 22:08:16 +09:00
Yusuke Shinyama
06425bba00
Introducing PDFObjectNotFound
2013-10-09 21:39:23 +09:00
Yusuke Shinyama
3c3cba2ecc
Moved: import PIL.
2013-04-09 18:42:32 +09:00
Yusuke Shinyama
19e7d70ac1
Merge pull request #15 from jcushman/patch-1
...
2x faster layout analysis: Use set instead of list for Plane's internal collection of objects.
2013-04-09 02:39:46 -07:00
Yusuke Shinyama
4faccff9c9
Merge pull request #16 from jcushman/master
...
2x faster group_textboxes function.
2013-04-09 01:58:56 -07:00
Yusuke Shinyama
d8bc13b3af
Merge pull request #13 from gendoc/master
...
PDFDocument.lookup_name.lookup isn't searching for 'Names' key.
2013-04-09 01:55:54 -07:00
Jordan Reiter
e28b75a462
StringIO
2013-03-27 13:14:58 -04:00
Jordan Reiter
44653071c3
Fixes for LZW error (see https://bitbucket.org/hsoft/pdfminer3k/commits/ae9a4ca0691a/ )
2013-03-27 13:05:29 -04:00
jcushman
f77f196cd3
2x faster group_textboxes function.
2012-06-22 18:11:45 -03:00
jcushman
da3f023b2d
Use set instead of list for Plane's internal collection of objects.
2012-06-22 16:36:33 -03:00
Humberto Pereira
89c81db295
PDFDocument.lookup_names.lookup didn't find 'Names' in some files
2012-03-19 16:42:58 -03:00
Jim Morrison
6413eb7de4
Deal with CMYK images by converting them to RGB. PIL does not invert CMYK images as of PIL 1.1.7, so the invert happens in ImageWriter.
2012-01-24 16:18:36 -08:00
Yusuke Shinyama
c7709045e9
fixed: invalid bmp file output
2011-11-08 00:29:24 +10:00
Yusuke Shinyama
82ff98c7b3
imagewriter now works with text output
2011-11-07 01:15:10 +10:00
Yusuke Shinyama
91174b5665
avoid crash when colorspace is null.
2011-11-06 20:10:48 +10:00
Yusuke Shinyama
3d1652963a
Merge github.com:euske/pdfminer
2011-10-30 15:44:49 +10:00
dwilson
60dbf6bb69
avoids crash in pdf syntax error for missing ids
...
when an object id is out of range, rather than crashing, only raise a
pdf syntax error if STRICT is enabled and return None otherwise
2011-08-31 17:03:10 -04:00
Yusuke Shinyama
f638784e1d
experimental layout analysis improvements
2011-08-14 09:44:21 +09:00
Yusuke Shinyama
cbb8d869c7
removed initial cmap/ directory
2011-07-31 18:05:07 +10:00
Yusuke Shinyama
cdef0d7883
Merge github.com:euske/pdfminer
2011-07-31 17:47:20 +10:00
Yusuke Shinyama
46bb0107aa
fixed: crash due to small layout elements (thanks to hsoft)
2011-07-31 17:44:09 +10:00
Yusuke Shinyama
eec317ae10
Merge pull request #6 from rsennrich/master
...
cleaner widths for Adobe core 14 fonts. (thanks to rsennrich)
2011-07-31 00:39:36 -07:00
Yusuke Shinyama
24cd161fb7
CCITTFaxFilter.reversed fix
2011-07-31 17:36:02 +10:00
Rico
6e4f36d9a1
get width based on utf-8 char.
...
fills some gaps and fixes inconsistencies between standard encodings
2011-07-23 16:34:11 +02:00
Yusuke Shinyama
dc8fde0e47
added CCITTFaxFilter support and a very crude image extraction.
2011-07-18 21:07:00 +10:00
Yusuke Shinyama
2707ba75df
added CCITTFaxFilter support and a very crude image extraction.
2011-07-18 21:06:50 +10:00
Yusuke Shinyama
fda6f7ba5d
ccitt.py added.
2011-07-18 17:36:37 +10:00
Yusuke Shinyama
0278076ea8
PNG predictor added
2011-06-07 00:46:33 +09:00
Yusuke Shinyama
18a5058af6
separated predictor functions.
2011-06-07 00:31:03 +09:00
Yusuke Shinyama
170c97a12b
colorspace patch by Lieb Simon
2011-06-06 17:10:12 +09:00
Yusuke Shinyama
2e8180ddee
documentation update and version bump
2011-05-15 01:37:14 +09:00
Yusuke Shinyama
c134596e2f
code cleanup and testcase stabilization
2011-05-15 01:22:19 +09:00
Yusuke Shinyama
e5d02f8653
fixed the infinite recursion bug.
2011-05-14 16:32:09 +09:00
Yusuke Shinyama
0c41b8348e
code cleanup
2011-05-14 15:51:40 +09:00
Yusuke Shinyama
038ce4cd0c
added LTText.get_text() and .text property is no longer accessible.
2011-05-14 15:45:08 +09:00
Yusuke Shinyama
5004e4b28d
layout analysis speedup.
2011-05-14 14:17:39 +09:00
Yusuke Shinyama
095534b294
figure object now does not call analyze.
2011-05-14 14:17:22 +09:00
Yusuke Shinyama
b8d516fc52
extended Plane class.
2011-05-14 14:16:40 +09:00
Yusuke Shinyama
fcf0d74ecc
tweaks for debugging
2011-04-21 22:07:52 +09:00
Yusuke Shinyama
8f9684f6a6
code cleanup: layout analysis
2011-04-21 22:07:04 +09:00
Yusuke Shinyama
0e660dd385
rename: LTPolygon -> LTCurve
2011-04-20 22:05:25 +09:00
Yusuke Shinyama
dab70855bf
LTLine is now strictly horizontal or vertical.
2011-04-20 22:01:54 +09:00
Jonathan J Hunt
ec682539da
Optimized memory usage in TextConverter by ignoring all drawing commands.
2011-03-07 15:11:31 +10:00
Yusuke Shinyama
4918d59bc2
disable caching support
2011-03-03 00:04:43 +09:00
Yusuke Shinyama
18e782f330
canonicalize package names
2011-03-02 23:43:03 +09:00
Yusuke Shinyama
bb26cf9180
eliminate empty textboxes
2011-03-01 20:47:20 +09:00
Yusuke Shinyama
dfd621b98c
minor bugfix. thanks to Hiroshi Manabe.
2011-02-28 19:50:07 +09:00
Yusuke Shinyama
f22b056454
release-20110227
2011-02-27 19:53:12 +09:00
Yusuke Shinyama
a8bf9b159e
docstring fix
2011-02-27 13:09:12 +09:00
Yusuke Shinyama
cabaa10e4f
layout analysis improvement
2011-02-27 12:56:28 +09:00
Yusuke Shinyama
7dbb664db3
code cleanup and more debugging options
2011-02-14 23:42:05 +09:00
Yusuke Shinyama
f00f1dbd04
better layout analysis
2011-02-14 23:41:23 +09:00
Yusuke Shinyama
b2d13db29a
code cleanup
2011-02-14 22:51:20 +09:00
Yusuke Shinyama
cd412308bd
text flow detection bug fix (thanks to fujimoto-san)
2011-02-14 22:32:55 +09:00
Yusuke Shinyama
cbd58121e3
fix aggressive vertical writing detection (which ruins layout)
2011-02-02 23:09:34 +09:00
Yusuke Shinyama
109aedeb43
cfffont extension with no luck
2011-01-25 00:19:07 +09:00
Yusuke Shinyama
4eb6083c09
code cleanup
2011-01-03 18:11:22 +09:00
Yusuke Shinyama
16b2a87b24
CMAP_PATH environment variable support
2011-01-03 18:11:16 +09:00
Yusuke Shinyama
420169a692
release 20101226
2010-12-26 19:06:47 +09:00
Yusuke Shinyama
a24c452ba2
boxes_flow patch by Daniel Gerber
2010-12-26 17:26:39 +09:00
Yusuke Shinyama
3da3adad9b
method renamed: finish(self) -> analyze(self, laparams).
2010-12-26 16:56:21 +09:00
yusuke.shinyama.dummy
84ed94aec0
another bugfix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@281 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:41:03 +00:00
yusuke.shinyama.dummy
9bba7ac08b
oops, forgot to fix this
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@280 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:40:58 +00:00
yusuke.shinyama.dummy
f4ced29713
bugfix by Kevin Brubeck Unhammer
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@278 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-12-25 08:40:45 +00:00
yusuke.shinyama.dummy
2bf9c23801
check_extractable paramater added
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@276 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:28 +00:00
yusuke.shinyama.dummy
9f78915ea6
show cid for unknown characters
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@275 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-23 10:53:19 +00:00
yusuke.shinyama.dummy
7374b81383
htmlconverter improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@274 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 15:04:28 +00:00
yusuke.shinyama.dummy
fb4ce96309
add font-family
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@273 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:50 +00:00
yusuke.shinyama.dummy
476ecf7e32
add html exect layout mode; default changed.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@272 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:41 +00:00
yusuke.shinyama.dummy
08c5c66917
add debugging features
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@271 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:34 +00:00
yusuke.shinyama.dummy
434b24b6e5
remove unused method
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@270 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-14 10:07:27 +00:00
yusuke.shinyama.dummy
0d1f00fa9b
improved layout analysis for vertical script
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@269 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:14 +00:00
yusuke.shinyama.dummy
9584845358
layout analysis improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@268 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:40:05 +00:00
yusuke.shinyama.dummy
edbd3764a7
html layout output fix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@267 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:48 +00:00
yusuke.shinyama.dummy
1904b61355
documentation
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@266 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-11-09 10:39:40 +00:00
yusuke.shinyama.dummy
1a25c61a9f
fix empty hexstring bug and test cases.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@265 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-27 12:29:00 +00:00
yusuke.shinyama.dummy
509ab66319
stay with python2
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@264 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-19 09:57:01 +00:00
yusuke.shinyama.dummy
438b4953be
documentation bit and code cleanup
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@263 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:49 +00:00
yusuke.shinyama.dummy
71863aec67
minor bugfix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@262 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:43 +00:00
yusuke.shinyama.dummy
6a4b70f54a
code cleanup
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@261 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-18 15:04:38 +00:00
yusuke.shinyama.dummy
98442ed943
update the version number and documentation
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@256 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:58 +00:00
yusuke.shinyama.dummy
cc139db8a7
bugfix LTChar.is_vertical undefined. verticality is now handled by LTTextBox
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@254 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:23 +00:00
yusuke.shinyama.dummy
21f6cf8fb6
removed PDFStream.decomp(). turned out zlib can handle trailing bytes.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@253 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:18 +00:00
yusuke.shinyama.dummy
0ecd0b8f9d
attempt to recover encoding info from texfont
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@252 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:15:12 +00:00
yusuke.shinyama.dummy
afe33312c6
outline bug fixed
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@249 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:14:52 +00:00
yusuke.shinyama.dummy
0b962443ed
patch by Alexander Garden
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@248 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:14:46 +00:00
yusuke.shinyama.dummy
69d9d85685
nunpack TypeError fix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@246 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:52 +00:00
yusuke.shinyama.dummy
3305c07ba2
layout analysis improved
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@245 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:39 +00:00
yusuke.shinyama.dummy
bc1303e901
layout analysis improvement 1
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@244 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-10-17 05:13:33 +00:00
yusuke.shinyama.dummy
3b2aabaa10
version bump
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@243 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 07:00:01 +00:00
yusuke.shinyama.dummy
0944cfaded
test file simple3.pdf added.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@240 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:41 +00:00
yusuke.shinyama.dummy
83d2086f19
fix minor layout issue
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@239 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:31 +00:00
yusuke.shinyama.dummy
b871331659
improvement in fallback
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@238 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-29 06:39:24 +00:00
yusuke.shinyama.dummy
4554705881
glyphlist bug (due to my misunderstanding of spec.)
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@237 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-26 15:02:46 +00:00
yusuke.shinyama.dummy
ac74542d1f
minor bugfixes
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@234 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-08-26 15:02:29 +00:00
yusuke.shinyama.dummy
1a8692124f
version bump
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@233 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 04:31:12 +00:00
yusuke.shinyama.dummy
2d02833936
release 20100619
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@230 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 03:58:20 +00:00
yusuke.shinyama.dummy
f5aff374fc
some wordings and documentations
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@229 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-19 03:56:50 +00:00
yusuke.shinyama.dummy
a0dd46bd8e
cmap compression patch. thanks to Jakub Wilk
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@228 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-13 13:50:24 +00:00
yusuke.shinyama.dummy
3f831c8104
bugfixes. thanks to Jakub Wilk
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@226 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-13 04:02:30 +00:00
yusuke.shinyama.dummy
702f3088ae
unittest failure fix
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@222 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-06 05:16:29 +00:00
yusuke.shinyama.dummy
cf52476f5e
remove redundancy
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@221 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-06-06 05:16:21 +00:00
yusuke.shinyama.dummy
fe3bdbfce0
text rise support added
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@217 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-05-18 14:57:04 +00:00
yusuke.shinyama.dummy
8e92ddca30
latin2ascii.py was moved as a utility
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@215 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-05-05 05:51:11 +00:00
yusuke.shinyama.dummy
7f587cafec
some usage document added
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@214 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 13:31:31 +00:00
yusuke.shinyama.dummy
eb535d4106
change PDFPageAggregator -> PDFLayoutAnalyzer
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@213 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 13:31:21 +00:00
yusuke.shinyama.dummy
833f859449
move TagExtractor
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@212 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 13:31:11 +00:00
yusuke.shinyama.dummy
a16eba30b7
release 20100424
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@210 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 04:32:21 +00:00
yusuke.shinyama.dummy
97848409e5
fix xobject resources bug, thanks to Jose Maria
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@209 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 04:32:03 +00:00
yusuke.shinyama.dummy
9052cd1ea7
better TOC extraction
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@207 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-24 01:34:18 +00:00
yusuke.shinyama.dummy
e77a6ba997
-A (all_texts) option added for layout analysis
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@205 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:30:03 +00:00
yusuke.shinyama.dummy
609c6e1f5f
rename: LayoutItem -> LTItem, LayoutContainer -> LTContainer
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@203 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:29:30 +00:00
yusuke.shinyama.dummy
c81142aa44
image handling addition (untested)
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@202 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-04-10 11:05:02 +00:00
yusuke.shinyama.dummy
71defb2272
documentation bit, ready for release-20100327
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@198 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-27 06:06:09 +00:00
yusuke.shinyama.dummy
5f822f6dcb
improved layout analysis.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@197 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-26 11:11:35 +00:00
yusuke.shinyama.dummy
2e5b92c18a
writing mode detection
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@196 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-25 11:38:47 +00:00
yusuke.shinyama.dummy
e536b3ef11
more bugfixes.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@194 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-23 10:29:52 +00:00
yusuke.shinyama.dummy
ee34d8d549
bugfix (thanks to Brian Berry).
...
Remaining TODOs: automatic testing for vertical texts. Various layout analysis tuning.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@193 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 08:36:39 +00:00
yusuke.shinyama.dummy
25636d7c08
release-20100322
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@192 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 06:22:33 +00:00
yusuke.shinyama.dummy
40b36a7c42
consistent test results
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@191 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 06:04:54 +00:00
yusuke.shinyama.dummy
a6523d1a9a
patch from pietvo.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@190 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 04:46:59 +00:00
yusuke.shinyama.dummy
fa13122f09
add regression tests.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@189 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 04:34:52 +00:00
yusuke.shinyama.dummy
cd39642abe
code cleanup
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@188 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-22 04:00:18 +00:00
yusuke.shinyama.dummy
e01cb43e31
add novel layout analysis
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@187 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-21 02:21:37 +00:00
yusuke.shinyama.dummy
ffaaea0bac
layout analysis changed drastically.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@186 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-20 05:43:34 +00:00
yusuke.shinyama.dummy
85c5476623
A couple of bugfixes. Thanks to Sean Manefield.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@185 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-03-12 13:47:39 +00:00
yusuke.shinyama.dummy
23be96c49e
CAUTION! changed the way of internal layout handling.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@184 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-27 03:59:25 +00:00
yusuke.shinyama.dummy
2555b38836
fix typos (patches by sm)
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@183 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-15 14:50:19 +00:00
yusuke.shinyama.dummy
aad921b382
version bump.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@182 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-13 15:02:34 +00:00
yusuke.shinyama.dummy
2dee2efad9
apply more patches
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@181 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-13 15:00:43 +00:00
yusuke.shinyama.dummy
0424fd8dc9
incorporated some patches by Andre Auzi
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@180 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-07 15:11:24 +00:00
yusuke.shinyama.dummy
538a605ac0
several bugfixes.
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@179 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-02-07 03:14:00 +00:00
yusuke.shinyama.dummy
63033599ce
release-20100131
...
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@178 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-31 02:13:30 +00:00
yusuke.shinyama.dummy
dda60dcafc
integrate TODO html.
...
reorder the code bit.
git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@177 1aa58f4a-7d42-0410-adbc-911cccaed67c
2010-01-31 02:12:51 +00:00