pdfminer.six/tests/test_highlevel_extracttext.py

import unittest

from helpers import absolute_sample_path
from pdfminer.high_level import extract_text


def run(sample_path):
    absolute_path = absolute_sample_path(sample_path)
    s = extract_text(absolute_path)
    return s


test_strings = {
    "simple1.pdf": "Hello \n\nWorld\n\nHello \n\nWorld\n\n"
                   "H e l l o  \n\nW o r l d\n\n"
                   "H e l l o  \n\nW o r l d\n\n\f",
    "simple2.pdf": "\f",
    "simple3.pdf": "HelloHello\n\nWorld\n\nWorld\n\n\f",
}


class TestExtractText(unittest.TestCase):
    def test_simple1(self):
        test_file = "simple1.pdf"
        s = run(test_file)
        self.assertEqual(s, test_strings[test_file])

    def test_simple2(self):
        test_file = "simple2.pdf"
        s = run(test_file)
        self.assertEqual(s, test_strings[test_file])

    def test_simple3(self):
        test_file = "simple3.pdf"
        s = run(test_file)
        self.assertEqual(s, test_strings[test_file])


if __name__ == "__main__":
    unittest.main()
Added: simple wrapper to extract text from pdf (#330) Fixes #327 2019-11-07 06:54:10 +00:00			`import unittest`

			`from helpers import absolute_sample_path`
			`from pdfminer.high_level import extract_text`


			`def run(sample_path):`
			`absolute_path = absolute_sample_path(sample_path)`
			`s = extract_text(absolute_path)`
			`return s`


			`test_strings = {`
Enforce pep8 coding-style (#345) * Code Refractor: Use code-style enforcement #312 * Add flake8 to travis-ci * Remove python 2 3 comment on six library. 891 errors > 870 errors. * Remove class and functions comments that consist of just the name. 870 errors > 855 errors. * Fix flake8 errors in pdftypes.py. 855 errors > 833 errors. * Moving flake8 testing from .travis.yml to tox.ini to ensure local testing before commiting * Cleanup pdfinterp.py and add documentation from PDF Reference * Cleanup pdfpage.py * Cleanup pdffont.py * Clean psparser.py * Cleanup high_level.py * Cleanup layout.py * Cleanup pdfparser.py * Cleanup pdfcolor.py * Cleanup rijndael.py * Cleanup converter.py * Rename klass to cls if it is the class variable, to be more consistent with standard practice * Cleanup cmap.py * Cleanup pdfdevice.py * flake8 ignore fontmetrics.py * Cleanup test_pdfminer_psparser.py * Fix flake8 in pdfdocument.py; 339 errors to go * Fix flake8 utils.py; 326 errors togo * pep8 correction for few files in /tools/ 328 > 160 to go (#342) * pep8 correction for few files in /tools/ 328 > 160 to go * pep8 correction: 160 > 5 to go * Fix ascii85.py errors * Fix error in getting index from target that does not exists * Remove commented print lines * Fix flake8 error in pdfinterp.py * Fix python2 specific error by removing argument from print statement * Ignore invalid python2 syntax * Update contributing.md * Added changelog * Remove unused import Co-authored-by: Fakabbir Amin <f4amin@gmail.com> 2019-12-29 20:20:20 +00:00			`"simple1.pdf": "Hello \n\nWorld\n\nHello \n\nWorld\n\n"`
			`"H e l l o \n\nW o r l d\n\n"`
			`"H e l l o \n\nW o r l d\n\n\f",`
Added: simple wrapper to extract text from pdf (#330) Fixes #327 2019-11-07 06:54:10 +00:00			`"simple2.pdf": "\f",`
			`"simple3.pdf": "HelloHello\n\nWorld\n\nWorld\n\n\f",`
			`}`


			`class TestExtractText(unittest.TestCase):`
			`def test_simple1(self):`
			`test_file = "simple1.pdf"`
			`s = run(test_file)`
			`self.assertEqual(s, test_strings[test_file])`

			`def test_simple2(self):`
			`test_file = "simple2.pdf"`
			`s = run(test_file)`
			`self.assertEqual(s, test_strings[test_file])`

			`def test_simple3(self):`
			`test_file = "simple3.pdf"`
			`s = run(test_file)`
			`self.assertEqual(s, test_strings[test_file])`


			`if __name__ == "__main__":`
			`unittest.main()`