diff --git a/README.md b/README.md index c78565a..101bec0 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,8 @@ pdfminer.six [![PyPI version](https://img.shields.io/pypi/v/pdfminer.six.svg)](https://pypi.python.org/pypi/pdfminer.six/) [![gitter](https://badges.gitter.im/pdfminer-six/Lobby.svg)](https://gitter.im/pdfminer-six/Lobby?utm_source=badge&utm_medium) +*We fathom PDF* + Pdfminer.six is a community maintained fork of the original PDFMiner. It is a tool for extracting information from PDF documents. It focuses on getting and analyzing text data. Pdfminer.six extracts the text from a page directly diff --git a/docs/source/faq.rst b/docs/source/faq.rst new file mode 100644 index 0000000..5a742d6 --- /dev/null +++ b/docs/source/faq.rst @@ -0,0 +1,41 @@ +.. _faq: + +Frequently asked questions +************************** + +Why is it called pdfminer.six? +============================== + +Pdfminer.six is a fork of the `original pdfminer created by Euske +`_. Almost all of the code and architecture is in +fact created by Euske. But, for a long time this original pdfminer did not +support Python 3. Until 2020 the original pdfminer only supported Python 2. +The original goal of pdfminer.six was to add support for Python 3. This was +done with the six package. The six package helps to write code that is +compatible with both Python 2 and Python 3. Hence, pdfminer.six. + +As of 2020, pdfminer.six dropped the support for Python 2 because it was +`end-of-life `_. While the .six +part is no longer applicable, we kept the name to prevent breaking changes for +existing users. + +The current punchline "We fathom PDF" is a `whimsical reference +`_ +to the six. Fathom means both deeply understanding something, and a fathom is +also equal to six feet. + +How does pdfminer.six compare to other forks of pdfminer? +========================================================== + +Pdfminer.six is now an independent and community maintained package for +extracting text from PDF's with Python. We actively fix bugs (also for PDF's +that don't strictly follow the PDF Reference), add new features and improve +the usability of pdfminer.six. This community separates pdfminer.six from the +other forks of the original pdfminer. PDF as a format is very diverse and +there are countless deviations from the official format. The only way to +support all the PDF's out there is to have a community that actively uses and +improves pdfminer. + +Since 2020, the original pdfminer is `dormant +`_, and pdfminer.six is the fork +which Euske recommends if you need an actively maintained version of pdfminer. diff --git a/docs/source/index.rst b/docs/source/index.rst index dd06e2d..d73fc04 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -13,6 +13,7 @@ Welcome to pdfminer.six's documentation! :target: https://gitter.im/pdfminer-six/Lobby?utm_source=badge&utm_medium :alt: gitter badge +We fathom PDF. Pdfminer.six is a python package for extracting information from PDF documents. @@ -38,6 +39,7 @@ pdfminer.six. howto/index topic/index reference/index + faq Features diff --git a/docs/source/reference/highlevel.rst b/docs/source/reference/highlevel.rst index 9d98ba6..b764e90 100644 --- a/docs/source/reference/highlevel.rst +++ b/docs/source/reference/highlevel.rst @@ -25,4 +25,6 @@ extract_pages ============= .. currentmodule:: pdfminer.high_level -.. autofunction:: extract_pages \ No newline at end of file +.. autofunction:: extract_pages + +.. _api_extract_pages: \ No newline at end of file