From 769dbb63437e9e7653a1e92d97e21e9855ed1654 Mon Sep 17 00:00:00 2001 From: Pieter Marsman Date: Sat, 5 Nov 2022 16:30:39 +0100 Subject: [PATCH] Consistent instructions for how to install and use pdfminer.six (#793) --- README.md | 15 ++++++++++++--- docs/source/index.rst | 25 +++++++++++++++++++------ docs/source/tutorial/commandline.rst | 4 ++-- 3 files changed, 33 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index b8c2542..182ffcf 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ How to use ---------- * Install Python 3.6 or newer. -* Install +* Install pdfminer.six. `pip install pdfminer.six` @@ -48,9 +48,18 @@ How to use `pip install 'pdfminer.six[image]'` -* Use command-line interface to extract text from pdf: +* Use the command-line interface to extract text from pdf. - `python pdf2txt.py samples/simple1.pdf` + `pdf2txt.py example.pdf` + +* Or use it with Python. + +```python +from pdfminer.high_level import extract_text + +text = extract_text("example.pdf") +print(text) +``` Contributing ------------ diff --git a/docs/source/index.rst b/docs/source/index.rst index a6e666e..8650b5d 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -59,18 +59,31 @@ Features Installation instructions ========================= -Before using it, you must install it using Python 3.6 or newer. +* Install Python 3.6 or newer. +* Install pdfminer.six. :: + $ pip install pdfminer.six` - $ pip install pdfminer.six - - -Optionally install extra dependencies that are needed to extract jpg images. +* (Optionally) install extra dependencies for extracting images. :: + $ pip install 'pdfminer.six[image]'` + +* Use the command-line interface to extract text from pdf. + +:: + $ pdf2txt.py example.pdf` + +* Or use it with Python. + +.. code-block:: python + + from pdfminer.high_level import extract_text + + text = extract_text("example.pdf") + print(text) - $ pip install 'pdfminer.six[image]' Contributing diff --git a/docs/source/tutorial/commandline.rst b/docs/source/tutorial/commandline.rst index 5aa352d..f780d36 100644 --- a/docs/source/tutorial/commandline.rst +++ b/docs/source/tutorial/commandline.rst @@ -18,7 +18,7 @@ pdf2txt.py :: - $ python tools/pdf2txt.py example.pdf + $ pdf2txt.py example.pdf all the text from the pdf appears on the command line The :ref:`api_pdf2txt` tool extracts all the text from a PDF. It uses layout @@ -29,7 +29,7 @@ dumppdf.py :: - $ python tools/dumppdf.py -a example.pdf + $ dumppdf.py -a example.pdf ...