Commit Graph

  • c026ac0509
    Remove confusing sentence about adding spaces Pieter Marsman 2020-03-26 23:01:38 +0100
  • ad2a2dc50c
    Small change in sentence about word_margin Pieter Marsman 2020-03-26 22:59:53 +0100
  • 1a4a06da9f
    Fix #392 Split out IO logic from high level functions (#393) Jake Stockwin 2020-03-26 21:52:00 +0000
  • 55bfe02584 Pin version of tox to ensure python 3.4 support Jake Stockwin 2020-03-26 14:18:02 +0000
  • 6128e0e42d Update boxes_flow documentation for pdf2text Jake Stockwin 2020-03-26 10:54:09 +0000
  • 5d81e8e407 Updated misleading documentation about word_margin Jake Stockwin 2020-03-26 10:27:20 +0000
  • 0d9c914250 Small wording changes, remove unnecessary comment Jake Stockwin 2020-03-26 10:11:33 +0000
  • abd72c41e0 PR Review - move open_filename to utils Jake Stockwin 2020-03-26 10:05:54 +0000
  • ffecd5e57d Allow file-like inputs to high level functions (#392) Jake Stockwin 2020-03-26 10:04:28 +0000
  • 20bf807e66 Apply comments from code review Jake Stockwin 2020-03-26 09:56:24 +0000
  • b095408482 Update documentation for boxes_flow, allow None Jake Stockwin 2020-03-26 09:52:41 +0000
  • abc84293dc Change method naming and call in group_textlines jacobefaust 2020-03-25 22:10:11 -0500
  • 41a14e3d38 Elimitate split_tables parameter. Switch cell_margin default. jacobefaust 2020-03-25 21:34:37 -0500
  • 1cc1b961c5
    Also group center-aligned text lines in addition to left-aligned and right-aligned text lines (#382) (#384) Jake Stockwin 2020-03-23 21:38:39 +0000
  • 69489c5c6a Cosmetic changes from code review Jake Stockwin 2020-03-17 09:37:52 +0000
  • bc36a366f8 Update changelog Jake Stockwin 2020-03-17 09:34:22 +0000
  • 9d7fe2d9ee
    Catch ValueError when converting font encoding differences to characters (#389) Pieter Marsman 2020-03-16 20:12:45 +0100
  • 0c4ad2a668
    Merge branch 'develop' into fix-385-catch-value-error Pieter Marsman 2020-03-16 20:12:20 +0100
  • 6b3c75cd7d Add tests for find_neighbors Jake Stockwin 2020-03-16 13:43:49 +0000
  • f2f97e17c0 Add missing docstrings Jake Stockwin 2020-03-16 13:17:27 +0000
  • a55d4141bf Add comparison private methods to LTTextLines Jake Stockwin 2020-03-16 13:10:28 +0000
  • c0b1fe64e1 Group text lines if they are centered (#382) Jake Stockwin 2020-03-04 09:33:43 +0000
  • a087d6dfc8
    Fix typo in README.md (#388) fzyzcjy 2020-03-14 18:00:37 +0800
  • 1d773dc38a
    Fix grouping textlines when bounding box of parent container is wrong (#386) Pieter Marsman 2020-03-14 10:33:39 +0100
  • 66b7dc0fe4 Added line to CHANGELOG.md Pieter Marsman 2020-03-14 10:28:09 +0100
  • d1e8a3bf79 Add test for catching ValueError and KeyError when font encoding differences are invalid Pieter Marsman 2020-03-14 10:23:30 +0100
  • f2eecd9be9 Catch ValueError when calling `name2unicode` when a unicode value cannot be parsed Pieter Marsman 2020-03-14 10:15:11 +0100
  • 3c4191ed64
    Update README.md - fix typo "inmplement" -> "implement" fzyzcjy 2020-03-12 20:50:06 +0800
  • 65240fb4f7 Added CHANGELOG.md line Pieter Marsman 2020-03-10 21:12:11 +0100
  • 038d855b9e Added test for grouping textlines where 1 is outside the parent bounding box Pieter Marsman 2020-03-10 21:04:20 +0100
  • 5a9bc8e608 Fix edge case: when no neighbors are found a line should form its own text box Pieter Marsman 2020-03-10 20:45:11 +0100
  • df600900ff Default value for --all-texts should be false, because using the flag enables it Pieter Marsman 2020-03-10 20:44:13 +0100
  • 7e91d4ec6d Improve docs and github templates Pieter Marsman 2020-03-08 14:53:16 +0100
  • 7d432fdbee Correct doc format 2 match pdfminer best practices jacobefaust 2020-03-07 20:09:18 -0600
  • 776a399cb8 Revert Change to LTCurve __init__ jacobefaust 2020-03-07 19:30:58 -0600
  • d07ad208e9 Add add'l documetation. jacobefaust 2020-03-02 20:38:04 -0600
  • 5e104ae0ad Fix pep8 violations jacobefaust 2020-03-02 20:18:20 -0600
  • adb6cdb3af
    Merge branch 'master' into master jacobefaust 2020-03-02 19:52:19 -0600
  • e3f247245e Adjust split logic. Add addl classes to splitobjs jacobefaust 2020-02-28 20:35:15 -0600
  • a7daed7501 Adjust default and keep splitobjs jacobefaust 2020-02-27 20:26:21 -0600
  • 9a5f051151 Remove false for loop jacobefaust 2020-02-19 19:47:32 -0600
  • 092e435b3b Pep8 and add splitobjs input to group_objects jacobefaust 2020-02-18 20:51:01 -0600
  • 5ad66d53a8 improve documentation. Add splitting to group_objects jacobefaust 2020-02-18 20:48:09 -0600
  • 12f0ceec1e More Pep8 and document higher level calls jacobefaust 2020-02-16 22:47:58 -0600
  • 5d012094f1 Pep8 __repr__ jacobefaust 2020-02-13 20:19:16 -0600
  • dab7947510 IndentationError Gilbert Brault 2020-02-10 11:14:57 +0100
  • c7aa3ef826 improved documentation Gilbert Brault 2020-02-10 08:27:38 +0100
  • 40c6dc3015 added comments to lrtd_parse Gilbert Brault 2020-02-10 08:24:35 +0100
  • 95563605e3 added lrtd_parse fonction to high_level module Gilbert Brault 2020-02-10 08:19:04 +0100
  • 80dbb3bca6 Fix converting path to multiple rectangles Kwok-kuen Cheung 2020-02-05 14:09:50 +0800
  • bab6d154c2 Bump version 20200124 20200124 Pieter Marsman 2020-01-24 12:38:11 +0100
  • 1c3047b68b
    Remove samples/ directory from source distribution to prevent downloading all pdf's when installing pdfminer.six (#364) Pieter Marsman 2020-01-24 12:36:02 +0100
  • 622dd908ff Remove unused imports Pieter Marsman 2020-01-24 12:30:27 +0100
  • 2b12bfd6cd Added line to CHANGELOG.md Pieter Marsman 2020-01-24 12:29:54 +0100
  • 218dc2c308 Remove issue-00152-embedded-pdf.pdf because it contains a possible exploit. Pieter Marsman 2020-01-24 12:21:41 +0100
  • 67cc5edf6c Remove samples/ and docs/ from source distribution. The samples/ dictionairy contains pdf's for testing purposes and the docs/ contain readthedocs documentation and is published online. Pieter Marsman 2020-01-24 12:16:30 +0100
  • bc494ff03c Bump version to 20200121 20200121 Pieter Marsman 2020-01-21 21:13:52 +0100
  • 52da65d5eb
    Remove latin2ascii.py because it converts the latin-interpreted bytes of a file to ascii, but this has not much to do with PDF's. (#360) Pieter Marsman 2020-01-16 22:26:01 +0100
  • 410d7ecac3
    Fix value for font-family in html by removing the subset tag from the PDF font-name (#357) Pieter Marsman 2020-01-16 22:25:20 +0100
  • b07432e8b7
    Merge branch 'develop' into fix-font-name Pieter Marsman 2020-01-16 22:23:30 +0100
  • bd99beccde Use `html.escape()` to encode strings for html and always return `str` instead of `bytes` Pieter Marsman 2020-01-16 22:22:46 +0100
  • fff3ac2ba6
    Fix bug in computing character bounding box (#348) Pieter Marsman 2020-01-16 22:15:50 +0100
  • b8c9bc3cf9
    Merge branch 'develop' into fix-font-height Pieter Marsman 2020-01-16 22:12:20 +0100
  • 2f7f5d2667
    Fallback on backwards-compatible key (F) for embedded files URL's when the unicode URL (UF) does not exist (#338) Pieter Marsman 2020-01-16 22:11:42 +0100
  • 3bfbfc97eb Added line to CHANGELOG.md Pieter Marsman 2020-01-14 21:47:53 +0100
  • b341e0b1dd Remove latin2ascii.py because it converts the latin-interpreted bytes of a file to ascii, but this has not much to do with PDF's. Pieter Marsman 2020-01-14 21:41:09 +0100
  • 909c07e283 Add line to CHANGELOG Pieter Marsman 2020-01-09 20:51:36 +0100
  • 358c1c8b08 Add test for pdf that contains embedded pdf, and fix additional errors in looping over multiple xrefs Pieter Marsman 2020-01-09 20:47:08 +0100
  • 56c23ee31c Fix getting filename when extracting embedded files Pieter Marsman 2019-11-17 17:04:59 +0100
  • f5a308e277 Add documentation and clear variable name Pieter Marsman 2020-01-09 19:52:40 +0100
  • a33448b881 Added line to CHANGELOG.md Pieter Marsman 2020-01-09 19:51:19 +0100
  • ac565ad0ca Fix font name by removing subset tag Pieter Marsman 2020-01-09 19:43:18 +0100
  • 824e7552b5 Add line to CHANGELOG Pieter Marsman 2020-01-09 19:18:28 +0100
  • 771b9d57dc Add test for font sizes, and for this a high-level function that returns an iterator of LTPage objects Pieter Marsman 2020-01-07 22:44:12 +0100
  • d654c1406c Change expected outcome of `python tools/pdf2txt.py samples/simple3.pdf`, because it looks like an improvement. However, when I view `samples/simple3.pdf` I don't see any text at all. The change in expected outcome is explained by the fact that the bounding boxes of characters can be different, depending on the `/FontBBox` parameter of the font. Pieter Marsman 2019-12-30 17:56:14 +0100
  • 9b57703aff Refactor LTChar bounding box computation Pieter Marsman 2019-12-30 17:42:17 +0100
  • 5966b6fd03 Remove scaling font height/width with size of font bounding box Pieter Marsman 2019-12-30 17:40:24 +0100
  • 0b1741b9bf Pack the /P (ermissions) entry from the /Encrypt dictionionary in the file trailer, as unsigned long (#352) Recursing 2020-01-07 21:59:13 +0100
  • 7fb98264fb Standardize import style; multiple imports from same module on one line Pieter Marsman 2020-01-07 21:45:36 +0100
  • 066314a069 Only ints can be converted to a uint using two's-complement method Pieter Marsman 2020-01-07 21:43:02 +0100
  • 264f6c72c6 Add line to CHANGELOG.md Pieter Marsman 2020-01-07 20:46:54 +0100
  • f566546e9d Add test for issue #352 Pieter Marsman 2020-01-07 20:43:10 +0100
  • 1b4e2b9495 Extract function for resolving an twos-complement Pieter Marsman 2020-01-07 20:35:50 +0100
  • e4790fdbc2 Add AES as supported encryption method to docs Pieter Marsman 2020-01-07 18:38:53 +0100
  • cd6d425d75 handle negative values for p Recursing 2020-01-06 22:49:05 +0100
  • 90a366b8b6 Tread the permissions (the /P entry) as unsigned long, fix #186 Recursing 2020-01-05 15:21:36 +0100
  • ffb242f194 add check_extractable argument to high level functions Recursing 2020-01-05 13:54:39 +0100
  • b27d3d0aff Bump version 20200104 Pieter Marsman 2020-01-04 18:15:15 +0100
  • 6eb9957e8a Update docs: at least python 3.4 is needed now Pieter Marsman 2020-01-04 16:51:54 +0100
  • 3502dc9f3b
    Drop support for legacy Python 2 (#346) Pieter Marsman 2020-01-04 16:47:07 +0100
  • d79bfdfe5d Remove leftover class extentions of object Pieter Marsman 2019-12-29 23:30:08 +0100
  • 787866ba47 Remove python2 leftovers Pieter Marsman 2019-12-29 23:27:25 +0100
  • f4e12cf7f5 Added line to changelog Pieter Marsman 2019-12-29 23:23:16 +0100
  • aa2c03052b Fix flake8 errors Pieter Marsman 2019-12-29 23:20:15 +0100
  • 2657b48761 Update CONTRIBUTING.md Hugo 2019-10-20 13:38:56 +0300
  • eb14ee98d7 Replace six Hugo 2019-10-17 14:42:56 +0300
  • b911361f17 Python 3 imports Hugo 2019-10-17 14:48:36 +0300
  • 33ea6060bb Upgrade Python syntax with pyupgrade --py3-plus Hugo 2019-10-19 20:39:08 +0300
  • 885d4939ba Upgrade Python syntax with pyupgrade Hugo 2019-10-19 20:33:35 +0300
  • 20d6e52696 Add python_requires to help pip Hugo 2019-10-17 13:32:36 +0300