How to use
-
Install Python 3.6 or newer.
-
Install pdfminer.six.
pip install pdfminer.six
-
(Optionally) install extra dependencies for extracting images.
pip install 'pdfminer.six[image]'
-
Install pytesseract.
sudo apt install tesseract-ocr
sudo apt install poppler-utils