From 5b8874ff059db55bab306a8108549ceac3d7e8f6 Mon Sep 17 00:00:00 2001 From: "yusuke.shinyama.dummy" Date: Sun, 27 Apr 2008 11:55:51 +0000 Subject: [PATCH] documentation git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@25 1aa58f4a-7d42-0410-adbc-911cccaed67c --- README.html | 41 ++++++++++++++++++++++++++++------------- 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/README.html b/README.html index ba82d77..e8487e6 100644 --- a/README.html +++ b/README.html @@ -1,17 +1,23 @@ + + PDFMiner - + + -

PDFMiner

-
-Last Modified: Sun Apr 27 20:46:21 JST 2008 +Last Modified: Sun Apr 27 20:54:51 JST 2008
+ +
+

What's it?

PDFMiner is a suite of programs that aims to help extracting or analyzing text data from PDF documents. @@ -49,7 +55,8 @@ http://www.unixuser.org/~euske/python/pdfminer/pdfminer-dist-20080427.tar.gz http://pdfminerr.googlecode.com/svn/ -


+ +

Installation

Prerequisite: Python 2.4 or newer. @@ -81,7 +88,10 @@ Here is how: http://www.unixuser.org/~euske/pub/CMap.tar.bz2 -

  • $ tar jxf CMap.tar.bz2 +
  • Do the follwoing: +
    +$ tar jxf CMap.tar.bz2
    +
  • Put the CMap directory into the pdfminer directory.
  • Go to the pdfminer directory.
  • Do the follwoing: (this is optional but highly recommended)
    @@ -90,13 +100,15 @@ $ make cdbcmap -
    + +

    Usage

    PDFMiner comes with two programs: pdf2txt.py and dumppdf.py. +

    pdf2txt.py

    pdf2txt.py extracts text contents from a PDF file. @@ -149,6 +161,7 @@ By default, it extracts texts from all the pages.

    Increases the debug level. +

    dumppdf.py

    dumppdf.py dumps the internal contents of a PDF file @@ -199,16 +212,18 @@ no stream header is displayed for the ease of saving it to a file.

    Increases the debug level. -
    + +

    Changes

    • 2007/04/27: Basic encryption and LZW decoding support added.
    • 2007/01/07: Several bugfixes. Thanks to Nick Fabry for his contribution.
    • 2007/12/31: Initial release. -
    • 2004/12/24: Start writing the code... +
    • 2004/12/24: Start writing the code out of boredom...
    -
    + +

    Related Projects

    • pyPdf @@ -216,8 +231,8 @@ no stream header is displayed for the ease of saving it to a file.
    • pdfbox
    - -
    + +

    Terms and conditions

    @@ -245,6 +260,6 @@ OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -


    +
    Yusuke Shinyama