pdfminer.six/tools
Andrew Baumann d87bd025dd
pdf2txt: clean up construction of LAParams from arguments (#682)
* Fix pdf2txt --boxes-flow=disabled

Fixes:
```
$ pdf2txt.py --boxes-flow=disabled test.pdf
Traceback (most recent call last):
  File "tools/pdf2txt.py", line 204, in <module>
    sys.exit(main())
  File "tools/pdf2txt.py", line 198, in main
    outfp = extract_text(**vars(A))
  File "tools/pdf2txt.py", line 66, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "pdfminer/high_level.py", line 85, in extract_text_to_fp
    interpreter.process_page(page)
  File "pdfminer/pdfinterp.py", line 896, in process_page
    self.device.end_page(page)
  File "pdfminer/converter.py", line 51, in end_page
    self.cur_item.analyze(self.laparams)
  File "pdfminer/layout.py", line 822, in analyze
    group.analyze(laparams)
  File "pdfminer/layout.py", line 575, in analyze
    LTTextGroup.analyze(self, laparams)
  File "pdfminer/layout.py", line 362, in analyze
    obj.analyze(laparams)
  File "pdfminer/layout.py", line 575, in analyze
    LTTextGroup.analyze(self, laparams)
  File "pdfminer/layout.py", line 362, in analyze
    obj.analyze(laparams)
  File "pdfminer/layout.py", line 575, in analyze
    LTTextGroup.analyze(self, laparams)
  File "pdfminer/layout.py", line 362, in analyze
    obj.analyze(laparams)
  File "pdfminer/layout.py", line 577, in analyze
    self._objs.sort(
  File "pdfminer/layout.py", line 578, in <lambda>
    key=lambda obj: (1 - laparams.boxes_flow) * obj.x0
TypeError: unsupported operand type(s) for -: 'int' and 'str'
```

Related: Issue #477, PR #479

* update CHANGELOG

* merge CHANGELOG

* pdf2txt: clean up handling of layout parameter arguments
 * avoid specifying default values twice
 * construct LAParams earlier, rather than passing its components around
 * fix crash with --boxes_flow=disabled

* update CHANGELOG

* construct new LAParams, so _validate runs

* Improve readability of setting LAParams by explicitly copying them from parsed_args into init of LAParams. And move all parsed_args post processing to the parse_args() method.

* Add cli argument for line_overlap

* Also use default values from LAParams for --detect-vertical and --all-texts

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2022-01-25 22:06:06 +01:00
..
Makefile webapp fixed 2010-12-25 08:41:35 +00:00
__init__.py tools must be a module to enable scripts tests 2014-09-04 09:47:33 +02:00
conv_afm.py Add type annotations (#661) 2021-10-09 16:23:28 +02:00
conv_cmap.py Add type annotations (#661) 2021-10-09 16:23:28 +02:00
conv_glyphlist.py Add type annotations (#661) 2021-10-09 16:23:28 +02:00
dumppdf.py Add type annotations (#661) 2021-10-09 16:23:28 +02:00
pdf2txt.py pdf2txt: clean up construction of LAParams from arguments (#682) 2022-01-25 22:06:06 +01:00
pdfdiff.py Add type annotations (#661) 2021-10-09 16:23:28 +02:00
pdfstats.py Add type annotations (#661) 2021-10-09 16:23:28 +02:00
prof.py Add type annotations (#661) 2021-10-09 16:23:28 +02:00