Only use xref fallback if `PDFNoValidXRef` is raised and `fallback` is True (#684)
* check obj type * update changelog * Update CHANGELOG.md * add changes * update change * update changelog * Use fallback in except clause * Update changelog.md Co-authored-by: Pieter Marsman <pietermarsman@gmail.com> Co-authored-by: Tony Tong <baojia.tong@kensho.com>pull/707/head
parent
dc530f3a6f
commit
4b138a6bc5
|
@ -15,6 +15,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
- Add handling of JPXDecode filter to enable extraction of images for some pdfs ([#645](https://github.com/pdfminer/pdfminer.six/pull/645))
|
- Add handling of JPXDecode filter to enable extraction of images for some pdfs ([#645](https://github.com/pdfminer/pdfminer.six/pull/645))
|
||||||
- Fix extraction of jbig2 files, which was producing invalid files ([#652](https://github.com/pdfminer/pdfminer.six/pull/653))
|
- Fix extraction of jbig2 files, which was producing invalid files ([#652](https://github.com/pdfminer/pdfminer.six/pull/653))
|
||||||
- Crash in `pdf2txt.py --boxes-flow=disabled` ([#682](https://github.com/pdfminer/pdfminer.six/pull/682))
|
- Crash in `pdf2txt.py --boxes-flow=disabled` ([#682](https://github.com/pdfminer/pdfminer.six/pull/682))
|
||||||
|
- Only use xref fallback if `PDFNoValidXRef` is raised and `fallback` is True ([#684](https://github.com/pdfminer/pdfminer.six/pull/684))
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Replace warnings.warn with logging.Logger.warning in line with [recommended use](https://docs.python.org/3/howto/logging.html#when-to-use-logging) ([#673](https://github.com/pdfminer/pdfminer.six/pull/673))
|
||||||
|
|
||||||
## [20211012]
|
## [20211012]
|
||||||
|
|
||||||
|
@ -41,7 +45,6 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
- Support for Python 3.4 and 3.5 ([#522](https://github.com/pdfminer/pdfminer.six/pull/522))
|
- Support for Python 3.4 and 3.5 ([#522](https://github.com/pdfminer/pdfminer.six/pull/522))
|
||||||
- Unused dependency on `sortedcontainers` package ([#525](https://github.com/pdfminer/pdfminer.six/pull/525))
|
- Unused dependency on `sortedcontainers` package ([#525](https://github.com/pdfminer/pdfminer.six/pull/525))
|
||||||
- Support for non-standard output streams that are not binary ([#523](https://github.com/pdfminer/pdfminer.six/pull/523))
|
- Support for non-standard output streams that are not binary ([#523](https://github.com/pdfminer/pdfminer.six/pull/523))
|
||||||
- Replace warnings.warn with logging.Logger.warning in line with [recommended use](https://docs.python.org/3/howto/logging.html#when-to-use-logging) ([#673](https://github.com/pdfminer/pdfminer.six/pull/673))
|
|
||||||
- Dependency on typing-extensions introduced by [#661](https://github.com/pdfminer/pdfminer.six/pull/661) ([#677](https://github.com/pdfminer/pdfminer.six/pull/677))
|
- Dependency on typing-extensions introduced by [#661](https://github.com/pdfminer/pdfminer.six/pull/661) ([#677](https://github.com/pdfminer/pdfminer.six/pull/677))
|
||||||
|
|
||||||
## [20201018]
|
## [20201018]
|
||||||
|
|
|
@ -11,7 +11,7 @@ from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
|
||||||
from . import settings
|
from . import settings
|
||||||
from .arcfour import Arcfour
|
from .arcfour import Arcfour
|
||||||
from .pdfparser import PDFSyntaxError, PDFParser, PDFStreamParser
|
from .pdfparser import PDFSyntaxError, PDFParser, PDFStreamParser
|
||||||
from .pdftypes import DecipherCallable, PDFException, PDFTypeError, PDFStream,\
|
from .pdftypes import DecipherCallable, PDFException, PDFTypeError, PDFStream, \
|
||||||
PDFObjectNotFound, decipher_all, int_value, str_value, list_value, \
|
PDFObjectNotFound, decipher_all, int_value, str_value, list_value, \
|
||||||
uint_value, dict_value, stream_value
|
uint_value, dict_value, stream_value
|
||||||
from .psparser import PSEOF, literal_name, LIT, KWD
|
from .psparser import PSEOF, literal_name, LIT, KWD
|
||||||
|
@ -706,12 +706,12 @@ class PDFDocument:
|
||||||
pos = self.find_xref(parser)
|
pos = self.find_xref(parser)
|
||||||
self.read_xref_from(parser, pos, self.xrefs)
|
self.read_xref_from(parser, pos, self.xrefs)
|
||||||
except PDFNoValidXRef:
|
except PDFNoValidXRef:
|
||||||
pass # fallback = True
|
|
||||||
if fallback:
|
if fallback:
|
||||||
parser.fallback = True
|
parser.fallback = True
|
||||||
newxref = PDFXRefFallback()
|
newxref = PDFXRefFallback()
|
||||||
newxref.load(parser)
|
newxref.load(parser)
|
||||||
self.xrefs.append(newxref)
|
self.xrefs.append(newxref)
|
||||||
|
|
||||||
for xref in self.xrefs:
|
for xref in self.xrefs:
|
||||||
trailer = xref.get_trailer()
|
trailer = xref.get_trailer()
|
||||||
if not trailer:
|
if not trailer:
|
||||||
|
|
Loading…
Reference in New Issue