Fixed for rendering non breaking spaces (cid:160)
As stated in the PDF specification ISO 32000-1, table in Annex D.2 Latin Character Set and Encodings page 653 to 656 (available here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf): "The SPACE character shall also be encoded as 312 in MacRomanEncoding and as 240 in WinAnsiEncoding. This duplicate code shall signify a nonbreaking space; it shall be typographically the same as (U+003A) SPACE." The duplicate key was missing, therefore PDFMiner was returning the string "(cid:160)". This fix adds the duplicate key in latin_enc.py glyphlist.py does not need to be modified as it already contains a key for non breaking space https://github.com/lucanaso/pdfminer/blob/master/pdfminer/glyphlist.py#L2755.pull/55/head
parent
14fd0fd2d6
commit
63bb3caec2
|
@ -162,6 +162,7 @@ ENCODING = [
|
||||||
('mu', None, 181, 181, 181),
|
('mu', None, 181, 181, 181),
|
||||||
('multiply', None, None, 215, 215),
|
('multiply', None, None, 215, 215),
|
||||||
('n', 110, 110, 110, 110),
|
('n', 110, 110, 110, 110),
|
||||||
|
('nbspace', None, 202, 160, None),
|
||||||
('nine', 57, 57, 57, 57),
|
('nine', 57, 57, 57, 57),
|
||||||
('ntilde', None, 150, 241, 241),
|
('ntilde', None, 150, 241, 241),
|
||||||
('numbersign', 35, 35, 35, 35),
|
('numbersign', 35, 35, 35, 35),
|
||||||
|
|
Loading…
Reference in New Issue