pdfminer.six

Commit Graph

Author	SHA1	Message	Date
Yusuke Shinyama	c753dbac4c	Merge pull request #117 from native-api/png_pred_errors make ValueError's descriptive	2016-09-11 23:55:34 +09:00
Yusuke Shinyama	f1dd9ea6d2	Merge pull request #129 from lucanaso/lucanaso-patch-1 Fixed for rendering non breaking spaces (cid:160)	2016-09-11 23:53:03 +09:00
Yusuke Shinyama	177a4ab937	Fixed: #132 (PDFStream.get_filters: support multiple parameterless filters)	2016-09-11 23:52:13 +09:00
Yusuke Shinyama	e95a483790	Merge pull request #134 from speedplane/feature/Fix-Get-Filters Fix Bug with PDF Stream Decoder	2016-09-11 23:48:42 +09:00
Yusuke Shinyama	64fe538b24	Fixed: #114 (UnicodeEncodeError in PSLiteral)	2016-09-11 23:43:22 +09:00
speedplane	2049462f6f	Revert changes unrelated to this branch.	2016-06-13 23:42:21 -04:00
speedplane	b0b8818a41	Fix a bug with pdfminer which occurs when two or more filters are applied to a stream, even though no parameters are specified. The code would previously drop all of the streams after the first due to misapplication of the zip function.	2016-06-13 23:35:11 -04:00
lucanaso	63bb3caec2	Fixed for rendering non breaking spaces (cid:160) As stated in the PDF specification ISO 32000-1, table in Annex D.2 Latin Character Set and Encodings page 653 to 656 (available here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf): "The SPACE character shall also be encoded as 312 in MacRomanEncoding and as 240 in WinAnsiEncoding. This duplicate code shall signify a nonbreaking space; it shall be typographically the same as (U+003A) SPACE." The duplicate key was missing, therefore PDFMiner was returning the string "(cid:160)". This fix adds the duplicate key in latin_enc.py glyphlist.py does not need to be modified as it already contains a key for non breaking space https://github.com/lucanaso/pdfminer/blob/master/pdfminer/glyphlist.py#L2755.	2015-12-09 16:47:32 +01:00
Ivan Pozdeev	63c9378b8b	make ValueError's descriptive	2015-08-10 03:14:51 +03:00
Yusuke Shinyama	14fd0fd2d6	Fixed: #84 (fontname was in unicode)	2015-04-05 19:02:02 +09:00
speedplane	806ee603ff	More fixes to layout. The compute neighbors function for horizontal lines is only intended to find neighbors on differing lines. However, it's entirely possible that horizontal neighbors could appear. This commit finds horizontal neighbors in a horizonal line and merges them together into a single horizontal line if necessary. This leads to much better text extraction if the PDF was created in a funky way. For example (test case coming), I have seen PDFs which are written almost like vertical columns, but the text is entirely horizontal.	2014-12-12 00:36:59 -05:00
speedplane	45170e7183	There are a number of relatively complex changes here. Comments are in order of where the change appears. 1. When detecting text in a horizontal line, we already add a space between words if separated by more than word_margin apart. However now, we only do it if there is not already an existing space. This prevents multiple spaces being placed between words. 2. Detect a horizontal line if the line is zero width. This improves our detection of horizonal lines when looking for both horizontal and vertical. 3. Don't detect a vertical line if the previous letter is whitspace. Prevents double spaces being caught as vert lines. 4. Improve upon an unfortunate O(N^2) algorithm which I have seen taking many minutes to execute. Unfortunately, while the "fix" reduces algorithmic complexity, it isn't technically correct, so we only do it when we know things will take a long time.	2014-12-12 00:36:59 -05:00
Yusuke Shinyama	0112112458	Fixed: crash on invalid chr number.	2014-12-09 22:55:47 +09:00
speedplane	36977fbe08	Add debug flags for much of the debug output.	2014-11-11 23:36:58 -05:00
speedplane	ecc4d05675	Fix a unicode conversion bug. See https://github.com/euske/pdfminer/issues/75	2014-11-11 23:34:33 -05:00
Yusuke Shinyama	b0e035c24f	Style fix: always have an explicit return.	2014-07-15 21:38:29 +09:00
Yusuke Shinyama	f5b5e31921	Fixed: DecodeParms array support.	2014-07-09 19:07:27 +09:00
Yusuke Shinyama	137fc3a1ae	Use KWD instead of token.name.	2014-06-30 19:15:21 +09:00
Yusuke Shinyama	1ccfaff411	String-Bytes distinction (first attempt).	2014-06-30 19:05:56 +09:00
Yusuke Shinyama	8791355e1d	Cleanup imports. Use relative imports.	2014-06-26 18:12:39 +09:00
Yusuke Shinyama	2e900e5d10	Fixed for consistent test results. (hopefully...)	2014-06-26 17:41:31 +09:00
Yusuke Shinyama	fe86b4e64e	Changed: StringIO -> io.BytesIO	2014-06-25 19:55:41 +09:00
Yusuke Shinyama	44074b42ea	Added: stripcontrol for XMLConverter (-S option)	2014-06-22 00:33:00 +09:00
Yusuke Shinyama	81391c09f4	Fixed: #56 (with a derpy fix)	2014-06-18 19:11:45 +09:00
Yusuke Shinyama	bb866ae148	Changed: new except syntax (2.6 or above).	2014-06-16 18:50:07 +09:00
Yusuke Shinyama	28e96ba3d0	Use print as a function.	2014-06-15 12:14:33 +09:00
Yusuke Shinyama	0387a6c260	Removed: tuple-unpacking args.	2014-06-15 12:12:13 +09:00
Yusuke Shinyama	a8ec99a848	More autotest tweaks.	2014-06-15 10:52:59 +09:00
Yusuke Shinyama	1384a3fe8d	Code cleanup: removed some debug flags.	2014-06-14 15:43:10 +09:00
Yusuke Shinyama	d9680fca7e	Plane: preserve the object order so that the test result is always consistent.	2014-06-14 14:44:53 +09:00
Yusuke Shinyama	aed248610c	Fixed: dependency on pygame in a unittest.	2014-06-14 12:05:26 +09:00
Yusuke Shinyama	8e14ebf4e1	Use logging module instead of print.	2014-06-14 12:00:49 +09:00
Yusuke Shinyama	8e8e22c095	Fixed a layout bug introduced at `c97ec304`.	2014-06-13 23:05:04 +09:00
numion	a4997d6f10	Implement revision 4 and 5 encryption handler.	2014-05-19 16:27:43 +02:00
Michael R. Hines	ae2547b0f2	Stop throwing exception on LITERALS_DCT_DECODE I have PDF documents with images stream and two filters, don't throw exceptions on the second one (DCT).	2014-05-14 13:25:30 +08:00
Yusuke Shinyama	6b6fc264ff	Code refactoring: CMap and UnicodeMap both inherit CMapBase.	2014-04-16 18:57:16 +09:00
Yusuke Shinyama	b09c37902f	Fixed: issue #48 (thanks to speedplane)	2014-04-09 17:55:50 +09:00
Yusuke Shinyama	7b354c7ab3	Version 20140328	2014-03-28 22:49:18 +09:00
Yusuke Shinyama	340387bfc6	Cleanup: isinstance	2014-03-28 17:50:59 +09:00
Yusuke Shinyama	7849c8724a	Fixed: PDFXRefStream.get_objids returns invalid objids.	2014-03-28 17:29:26 +09:00
Yusuke Shinyama	57adad55d7	Revert the wrong fix.	2014-03-28 17:24:03 +09:00
Yusuke Shinyama	b18e8c549d	Version 20140327	2014-03-28 00:19:52 +09:00
Yusuke Shinyama	ee47a6603a	Fixed: issues #45	2014-03-28 00:18:17 +09:00
Yusuke Shinyama	ab03037444	Version 20140324	2014-03-24 21:03:46 +09:00
Yusuke Shinyama	4b2beba398	Code cleanup.	2014-03-24 20:59:24 +09:00
Yusuke Shinyama	f9079e4c0a	Fixed dumppdf.py issues.	2014-03-24 20:55:00 +09:00
Yusuke Shinyama	607be269ab	Applied a patch by Axel Kaiser.	2014-03-24 20:45:35 +09:00
Yusuke Shinyama	d7c4ff28e9	Applied a patch by Axel Kaiser.	2014-03-24 20:39:30 +09:00
Yusuke Shinyama	636d4caeb3	Fixed the PNG predictor bug. Thanks to Gabor Molnar.	2014-03-24 19:57:05 +09:00
Yusuke Shinyama	c97ec3048e	Changed / to // for clarity.	2013-11-26 21:35:16 +09:00

1 2 3 4 5 ...

275 Commits (149652c072a150736280f7b27b4f0d0c5325d7bc)