Consistent instructions for how to install and use pdfminer.six (#793)

pull/829/head
Pieter Marsman 2022-11-05 16:30:39 +01:00 committed by GitHub
parent ad6587c697
commit 769dbb6343
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 33 additions and 11 deletions

View File

@ -40,7 +40,7 @@ How to use
----------
* Install Python 3.6 or newer.
* Install
* Install pdfminer.six.
`pip install pdfminer.six`
@ -48,9 +48,18 @@ How to use
`pip install 'pdfminer.six[image]'`
* Use command-line interface to extract text from pdf:
* Use the command-line interface to extract text from pdf.
`python pdf2txt.py samples/simple1.pdf`
`pdf2txt.py example.pdf`
* Or use it with Python.
```python
from pdfminer.high_level import extract_text
text = extract_text("example.pdf")
print(text)
```
Contributing
------------

View File

@ -59,18 +59,31 @@ Features
Installation instructions
=========================
Before using it, you must install it using Python 3.6 or newer.
* Install Python 3.6 or newer.
* Install pdfminer.six.
::
$ pip install pdfminer.six`
$ pip install pdfminer.six
Optionally install extra dependencies that are needed to extract jpg images.
* (Optionally) install extra dependencies for extracting images.
::
$ pip install 'pdfminer.six[image]'`
* Use the command-line interface to extract text from pdf.
::
$ pdf2txt.py example.pdf`
* Or use it with Python.
.. code-block:: python
from pdfminer.high_level import extract_text
text = extract_text("example.pdf")
print(text)
$ pip install 'pdfminer.six[image]'
Contributing

View File

@ -18,7 +18,7 @@ pdf2txt.py
::
$ python tools/pdf2txt.py example.pdf
$ pdf2txt.py example.pdf
all the text from the pdf appears on the command line
The :ref:`api_pdf2txt` tool extracts all the text from a PDF. It uses layout
@ -29,7 +29,7 @@ dumppdf.py
::
$ python tools/dumppdf.py -a example.pdf
$ dumppdf.py -a example.pdf
<pdf><object id="1">
...
</object>