pdfminer.six/pdfminer/ascii85.py



""" Python implementation of ASCII85/ASCIIHex decoder (Adobe version).

This code is in the public domain.

"""

import re
import struct

import six  # Python 2+3 compatibility


# ascii85decode(data)
def ascii85decode(data):
    """
    In ASCII85 encoding, every four bytes are encoded with five ASCII
    letters, using 85 different types of characters (as 256**4 < 85**5).
    When the length of the original bytes is not a multiple of 4, a special
    rule is used for round up.

    The Adobe's ASCII85 implementation is slightly different from
    its original in handling the last characters.

    """
    n = b = 0
    out = b''
    for i in six.iterbytes(data):
        c = six.int2byte(i)
        if b'!' <= c and c <= b'u':
            n += 1
            b = b*85+(ord(c)-33)
            if n == 5:
                out += struct.pack('>L', b)
                n = b = 0
        elif c == b'z':
            assert n == 0, str(n)
            out += b'\0\0\0\0'
        elif c == b'~':
            if n:
                for _ in range(5-n):
                    b = b*85+84
                out += struct.pack('>L', b)[:n-1]
            break
    return out


# asciihexdecode(data)
hex_re = re.compile(b'([a-f0-9]{2})', re.IGNORECASE)
trail_re = re.compile(b'^(?:[a-f0-9]{2}|[ \t\n\r\f\v])*'
                      b'([a-f0-9])[ \t\n\r\f\v>]*$', re.IGNORECASE)


def asciihexdecode(data):
    """
    ASCIIHexDecode filter: PDFReference v1.4 section 3.3.1
    For each pair of ASCII hexadecimal digits (0-9 and A-F or a-f), the
    ASCIIHexDecode filter produces one byte of binary data. All white-space
    characters are ignored. A right angle bracket character (>) indicates
    EOD. Any other characters will cause an error. If the filter encounters
    the EOD marker after reading an odd number of hexadecimal digits, it
    will behave as if a 0 followed the last digit.
    """
    def decode(x):
        i = int(x, 16)
        return six.int2byte(i)

    out = b''
    for x in hex_re.findall(data):
        out += decode(x)

    m = trail_re.search(data)
    if m:
        out += decode(m.group(1)+b'0')
    return out
Removing all the "#!/usr/bin/env python" lines, they do not need for … (#34) * Removing all the "#!/usr/bin/env python" lines, they do not need for python3, solving issue number: #19. * Restored all the shebangs in the tools and tests folders (because they are real executables) but used "#!/usr/bin/env python" instead of "#!/usr/bin/python" as this blog points out: https://www.peterbe.com/plog/importance-of-env Removed also the shebang from pdfminer/psparser.py file. 2016-11-08 19:01:11 +00:00
More docstrings. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@151 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-11-04 11:28:32 +00:00
			`""" Python implementation of ASCII85/ASCIIHex decoder (Adobe version).`

			`This code is in the public domain.`

			`"""`

			`import re`
			`import struct`
ASCII85 filter support added. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@48 1aa58f4a-7d42-0410-adbc-911cccaed67c 2008-08-30 07:40:52 +00:00
Enforce pep8 coding-style (#345) * Code Refractor: Use code-style enforcement #312 * Add flake8 to travis-ci * Remove python 2 3 comment on six library. 891 errors > 870 errors. * Remove class and functions comments that consist of just the name. 870 errors > 855 errors. * Fix flake8 errors in pdftypes.py. 855 errors > 833 errors. * Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting * Cleanup pdfinterp.py and add documentation from PDF Reference * Cleanup pdfpage.py * Cleanup pdffont.py * Clean psparser.py * Cleanup high_level.py * Cleanup layout.py * Cleanup pdfparser.py * Cleanup pdfcolor.py * Cleanup rijndael.py * Cleanup converter.py * Rename klass to cls if it is the class variable, to be more consistent with standard practice * Cleanup cmap.py * Cleanup pdfdevice.py * flake8 ignore fontmetrics.py * Cleanup test_pdfminer_psparser.py * Fix flake8 in pdfdocument.py; 339 errors to go * Fix flake8 utils.py; 326 errors togo * pep8 correction for few files in /tools/ 328 > 160 to go (#342) * pep8 correction for few files in /tools/ 328 > 160 to go * pep8 correction: 160 > 5 to go * Fix ascii85.py errors * Fix error in getting index from target that does not exists * Remove commented print lines * Fix flake8 error in pdfinterp.py * Fix python2 specific error by removing argument from print statement * Ignore invalid python2 syntax * Update contributing.md * Added changelog * Remove unused import Co-authored-by: Fakabbir Amin <f4amin@gmail.com> 2019-12-29 20:20:20 +00:00			`import six # Python 2+3 compatibility`
Python 3.4 support added and tested 2014-09-03 11:17:41 +00:00
PEP8: Whitespace changes to match pep8 2013-11-07 08:35:04 +00:00
ASCII85 filter support added. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@48 1aa58f4a-7d42-0410-adbc-911cccaed67c 2008-08-30 07:40:52 +00:00			`# ascii85decode(data)`
			`def ascii85decode(data):`
More docstrings. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@151 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-11-04 11:28:32 +00:00			`"""`
			`In ASCII85 encoding, every four bytes are encoded with five ASCII`
			`letters, using 85 different types of characters (as 2564 < 855).`
			`When the length of the original bytes is not a multiple of 4, a special`
			`rule is used for round up.`
PEP8: Remove trailing whitespace 2013-11-07 07:14:53 +00:00
More docstrings. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@151 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-11-04 11:28:32 +00:00			`The Adobe's ASCII85 implementation is slightly different from`
			`its original in handling the last characters.`
PEP8: Remove trailing whitespace 2013-11-07 07:14:53 +00:00
More docstrings. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@151 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-11-04 11:28:32 +00:00			`"""`
to 4-space indentation git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-10-24 04:41:59 +00:00			`n = b = 0`
String-Bytes distinction (first attempt). 2014-06-30 10:05:56 +00:00			`out = b''`
Python 3.4 support added and tested 2014-09-03 11:17:41 +00:00			`for i in six.iterbytes(data):`
Enforce pep8 coding-style (#345) * Code Refractor: Use code-style enforcement #312 * Add flake8 to travis-ci * Remove python 2 3 comment on six library. 891 errors > 870 errors. * Remove class and functions comments that consist of just the name. 870 errors > 855 errors. * Fix flake8 errors in pdftypes.py. 855 errors > 833 errors. * Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting * Cleanup pdfinterp.py and add documentation from PDF Reference * Cleanup pdfpage.py * Cleanup pdffont.py * Clean psparser.py * Cleanup high_level.py * Cleanup layout.py * Cleanup pdfparser.py * Cleanup pdfcolor.py * Cleanup rijndael.py * Cleanup converter.py * Rename klass to cls if it is the class variable, to be more consistent with standard practice * Cleanup cmap.py * Cleanup pdfdevice.py * flake8 ignore fontmetrics.py * Cleanup test_pdfminer_psparser.py * Fix flake8 in pdfdocument.py; 339 errors to go * Fix flake8 utils.py; 326 errors togo * pep8 correction for few files in /tools/ 328 > 160 to go (#342) * pep8 correction for few files in /tools/ 328 > 160 to go * pep8 correction: 160 > 5 to go * Fix ascii85.py errors * Fix error in getting index from target that does not exists * Remove commented print lines * Fix flake8 error in pdfinterp.py * Fix python2 specific error by removing argument from print statement * Ignore invalid python2 syntax * Update contributing.md * Added changelog * Remove unused import Co-authored-by: Fakabbir Amin <f4amin@gmail.com> 2019-12-29 20:20:20 +00:00			`c = six.int2byte(i)`
String-Bytes distinction (first attempt). 2014-06-30 10:05:56 +00:00			`if b'!' <= c and c <= b'u':`
to 4-space indentation git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-10-24 04:41:59 +00:00			`n += 1`
			`b = b*85+(ord(c)-33)`
			`if n == 5:`
PEP8: Whitespace changes to match pep8 2013-11-07 08:35:04 +00:00			`out += struct.pack('>L', b)`
to 4-space indentation git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-10-24 04:41:59 +00:00			`n = b = 0`
String-Bytes distinction (first attempt). 2014-06-30 10:05:56 +00:00			`elif c == b'z':`
Add string expressions to asserts showing local data (#67) 2017-05-29 07:06:09 +00:00			`assert n == 0, str(n)`
String-Bytes distinction (first attempt). 2014-06-30 10:05:56 +00:00			`out += b'\0\0\0\0'`
			`elif c == b'~':`
to 4-space indentation git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-10-24 04:41:59 +00:00			`if n:`
			`for _ in range(5-n):`
			`b = b*85+84`
PEP8: Whitespace changes to match pep8 2013-11-07 08:35:04 +00:00			`out += struct.pack('>L', b)[:n-1]`
to 4-space indentation git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-10-24 04:41:59 +00:00			`break`
			`return out`
ASCII85 filter support added. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@48 1aa58f4a-7d42-0410-adbc-911cccaed67c 2008-08-30 07:40:52 +00:00
Enforce pep8 coding-style (#345) * Code Refractor: Use code-style enforcement #312 * Add flake8 to travis-ci * Remove python 2 3 comment on six library. 891 errors > 870 errors. * Remove class and functions comments that consist of just the name. 870 errors > 855 errors. * Fix flake8 errors in pdftypes.py. 855 errors > 833 errors. * Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting * Cleanup pdfinterp.py and add documentation from PDF Reference * Cleanup pdfpage.py * Cleanup pdffont.py * Clean psparser.py * Cleanup high_level.py * Cleanup layout.py * Cleanup pdfparser.py * Cleanup pdfcolor.py * Cleanup rijndael.py * Cleanup converter.py * Rename klass to cls if it is the class variable, to be more consistent with standard practice * Cleanup cmap.py * Cleanup pdfdevice.py * flake8 ignore fontmetrics.py * Cleanup test_pdfminer_psparser.py * Fix flake8 in pdfdocument.py; 339 errors to go * Fix flake8 utils.py; 326 errors togo * pep8 correction for few files in /tools/ 328 > 160 to go (#342) * pep8 correction for few files in /tools/ 328 > 160 to go * pep8 correction: 160 > 5 to go * Fix ascii85.py errors * Fix error in getting index from target that does not exists * Remove commented print lines * Fix flake8 error in pdfinterp.py * Fix python2 specific error by removing argument from print statement * Ignore invalid python2 syntax * Update contributing.md * Added changelog * Remove unused import Co-authored-by: Fakabbir Amin <f4amin@gmail.com> 2019-12-29 20:20:20 +00:00
AsciiHexDecode filter patch incorporated. Thanks to Troy Bollinger. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@86 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-04-08 10:55:01 +00:00			`# asciihexdecode(data)`
Enforce pep8 coding-style (#345) * Code Refractor: Use code-style enforcement #312 * Add flake8 to travis-ci * Remove python 2 3 comment on six library. 891 errors > 870 errors. * Remove class and functions comments that consist of just the name. 870 errors > 855 errors. * Fix flake8 errors in pdftypes.py. 855 errors > 833 errors. * Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting * Cleanup pdfinterp.py and add documentation from PDF Reference * Cleanup pdfpage.py * Cleanup pdffont.py * Clean psparser.py * Cleanup high_level.py * Cleanup layout.py * Cleanup pdfparser.py * Cleanup pdfcolor.py * Cleanup rijndael.py * Cleanup converter.py * Rename klass to cls if it is the class variable, to be more consistent with standard practice * Cleanup cmap.py * Cleanup pdfdevice.py * flake8 ignore fontmetrics.py * Cleanup test_pdfminer_psparser.py * Fix flake8 in pdfdocument.py; 339 errors to go * Fix flake8 utils.py; 326 errors togo * pep8 correction for few files in /tools/ 328 > 160 to go (#342) * pep8 correction for few files in /tools/ 328 > 160 to go * pep8 correction: 160 > 5 to go * Fix ascii85.py errors * Fix error in getting index from target that does not exists * Remove commented print lines * Fix flake8 error in pdfinterp.py * Fix python2 specific error by removing argument from print statement * Ignore invalid python2 syntax * Update contributing.md * Added changelog * Remove unused import Co-authored-by: Fakabbir Amin <f4amin@gmail.com> 2019-12-29 20:20:20 +00:00			`hex_re = re.compile(b'([a-f0-9]{2})', re.IGNORECASE)`
			`trail_re = re.compile(b'^(?:[a-f0-9]{2}\|[ \t\n\r\f\v])*'`
			`b'([a-f0-9])[ \t\n\r\f\v>]*$', re.IGNORECASE)`
PEP8: Whitespace changes to match pep8 2013-11-07 08:35:04 +00:00

AsciiHexDecode filter patch incorporated. Thanks to Troy Bollinger. git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@86 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-04-08 10:55:01 +00:00			`def asciihexdecode(data):`
to 4-space indentation git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-10-24 04:41:59 +00:00			`"""`
			`ASCIIHexDecode filter: PDFReference v1.4 section 3.3.1`
			`For each pair of ASCII hexadecimal digits (0-9 and A-F or a-f), the`
			`ASCIIHexDecode filter produces one byte of binary data. All white-space`
			`characters are ignored. A right angle bracket character (>) indicates`
			`EOD. Any other characters will cause an error. If the filter encounters`
			`the EOD marker after reading an odd number of hexadecimal digits, it`
			`will behave as if a 0 followed the last digit.`
			`"""`
Python 3.4 compatibility + tests 2014-09-04 07:36:19 +00:00			`def decode(x):`
Enforce pep8 coding-style (#345) * Code Refractor: Use code-style enforcement #312 * Add flake8 to travis-ci * Remove python 2 3 comment on six library. 891 errors > 870 errors. * Remove class and functions comments that consist of just the name. 870 errors > 855 errors. * Fix flake8 errors in pdftypes.py. 855 errors > 833 errors. * Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting * Cleanup pdfinterp.py and add documentation from PDF Reference * Cleanup pdfpage.py * Cleanup pdffont.py * Clean psparser.py * Cleanup high_level.py * Cleanup layout.py * Cleanup pdfparser.py * Cleanup pdfcolor.py * Cleanup rijndael.py * Cleanup converter.py * Rename klass to cls if it is the class variable, to be more consistent with standard practice * Cleanup cmap.py * Cleanup pdfdevice.py * flake8 ignore fontmetrics.py * Cleanup test_pdfminer_psparser.py * Fix flake8 in pdfdocument.py; 339 errors to go * Fix flake8 utils.py; 326 errors togo * pep8 correction for few files in /tools/ 328 > 160 to go (#342) * pep8 correction for few files in /tools/ 328 > 160 to go * pep8 correction: 160 > 5 to go * Fix ascii85.py errors * Fix error in getting index from target that does not exists * Remove commented print lines * Fix flake8 error in pdfinterp.py * Fix python2 specific error by removing argument from print statement * Ignore invalid python2 syntax * Update contributing.md * Added changelog * Remove unused import Co-authored-by: Fakabbir Amin <f4amin@gmail.com> 2019-12-29 20:20:20 +00:00			`i = int(x, 16)`
Python 3.4 compatibility + tests 2014-09-04 07:36:19 +00:00			`return six.int2byte(i)`
Add string expressions to asserts showing local data (#67) 2017-05-29 07:06:09 +00:00
Enforce pep8 coding-style (#345) * Code Refractor: Use code-style enforcement #312 * Add flake8 to travis-ci * Remove python 2 3 comment on six library. 891 errors > 870 errors. * Remove class and functions comments that consist of just the name. 870 errors > 855 errors. * Fix flake8 errors in pdftypes.py. 855 errors > 833 errors. * Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting * Cleanup pdfinterp.py and add documentation from PDF Reference * Cleanup pdfpage.py * Cleanup pdffont.py * Clean psparser.py * Cleanup high_level.py * Cleanup layout.py * Cleanup pdfparser.py * Cleanup pdfcolor.py * Cleanup rijndael.py * Cleanup converter.py * Rename klass to cls if it is the class variable, to be more consistent with standard practice * Cleanup cmap.py * Cleanup pdfdevice.py * flake8 ignore fontmetrics.py * Cleanup test_pdfminer_psparser.py * Fix flake8 in pdfdocument.py; 339 errors to go * Fix flake8 utils.py; 326 errors togo * pep8 correction for few files in /tools/ 328 > 160 to go (#342) * pep8 correction for few files in /tools/ 328 > 160 to go * pep8 correction: 160 > 5 to go * Fix ascii85.py errors * Fix error in getting index from target that does not exists * Remove commented print lines * Fix flake8 error in pdfinterp.py * Fix python2 specific error by removing argument from print statement * Ignore invalid python2 syntax * Update contributing.md * Added changelog * Remove unused import Co-authored-by: Fakabbir Amin <f4amin@gmail.com> 2019-12-29 20:20:20 +00:00			`out = b''`
Python 3.4 compatibility + tests 2014-09-04 07:36:19 +00:00			`for x in hex_re.findall(data):`
Enforce pep8 coding-style (#345) * Code Refractor: Use code-style enforcement #312 * Add flake8 to travis-ci * Remove python 2 3 comment on six library. 891 errors > 870 errors. * Remove class and functions comments that consist of just the name. 870 errors > 855 errors. * Fix flake8 errors in pdftypes.py. 855 errors > 833 errors. * Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting * Cleanup pdfinterp.py and add documentation from PDF Reference * Cleanup pdfpage.py * Cleanup pdffont.py * Clean psparser.py * Cleanup high_level.py * Cleanup layout.py * Cleanup pdfparser.py * Cleanup pdfcolor.py * Cleanup rijndael.py * Cleanup converter.py * Rename klass to cls if it is the class variable, to be more consistent with standard practice * Cleanup cmap.py * Cleanup pdfdevice.py * flake8 ignore fontmetrics.py * Cleanup test_pdfminer_psparser.py * Fix flake8 in pdfdocument.py; 339 errors to go * Fix flake8 utils.py; 326 errors togo * pep8 correction for few files in /tools/ 328 > 160 to go (#342) * pep8 correction for few files in /tools/ 328 > 160 to go * pep8 correction: 160 > 5 to go * Fix ascii85.py errors * Fix error in getting index from target that does not exists * Remove commented print lines * Fix flake8 error in pdfinterp.py * Fix python2 specific error by removing argument from print statement * Ignore invalid python2 syntax * Update contributing.md * Added changelog * Remove unused import Co-authored-by: Fakabbir Amin <f4amin@gmail.com> 2019-12-29 20:20:20 +00:00			`out += decode(x)`
Add string expressions to asserts showing local data (#67) 2017-05-29 07:06:09 +00:00
to 4-space indentation git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@142 1aa58f4a-7d42-0410-adbc-911cccaed67c 2009-10-24 04:41:59 +00:00			`m = trail_re.search(data)`
			`if m:`
Enforce pep8 coding-style (#345) * Code Refractor: Use code-style enforcement #312 * Add flake8 to travis-ci * Remove python 2 3 comment on six library. 891 errors > 870 errors. * Remove class and functions comments that consist of just the name. 870 errors > 855 errors. * Fix flake8 errors in pdftypes.py. 855 errors > 833 errors. * Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting * Cleanup pdfinterp.py and add documentation from PDF Reference * Cleanup pdfpage.py * Cleanup pdffont.py * Clean psparser.py * Cleanup high_level.py * Cleanup layout.py * Cleanup pdfparser.py * Cleanup pdfcolor.py * Cleanup rijndael.py * Cleanup converter.py * Rename klass to cls if it is the class variable, to be more consistent with standard practice * Cleanup cmap.py * Cleanup pdfdevice.py * flake8 ignore fontmetrics.py * Cleanup test_pdfminer_psparser.py * Fix flake8 in pdfdocument.py; 339 errors to go * Fix flake8 utils.py; 326 errors togo * pep8 correction for few files in /tools/ 328 > 160 to go (#342) * pep8 correction for few files in /tools/ 328 > 160 to go * pep8 correction: 160 > 5 to go * Fix ascii85.py errors * Fix error in getting index from target that does not exists * Remove commented print lines * Fix flake8 error in pdfinterp.py * Fix python2 specific error by removing argument from print statement * Ignore invalid python2 syntax * Update contributing.md * Added changelog * Remove unused import Co-authored-by: Fakabbir Amin <f4amin@gmail.com> 2019-12-29 20:20:20 +00:00			`out += decode(m.group(1)+b'0')`
Python 3.4 compatibility + tests 2014-09-04 07:36:19 +00:00			`return out`