diff --git a/README.html b/README.html index ba82d77..e8487e6 100644 --- a/README.html +++ b/README.html @@ -1,17 +1,23 @@ + + PDFMiner - + + -

PDFMiner

-
-Last Modified: Sun Apr 27 20:46:21 JST 2008 +Last Modified: Sun Apr 27 20:54:51 JST 2008
+ +
+

What's it?

PDFMiner is a suite of programs that aims to help extracting or analyzing text data from PDF documents. @@ -49,7 +55,8 @@ http://www.unixuser.org/~euske/python/pdfminer/pdfminer-dist-20080427.tar.gz http://pdfminerr.googlecode.com/svn/ -


+ +

Installation

Prerequisite: Python 2.4 or newer. @@ -81,7 +88,10 @@ Here is how: http://www.unixuser.org/~euske/pub/CMap.tar.bz2 -

  • $ tar jxf CMap.tar.bz2 +
  • Do the follwoing: +
    +$ tar jxf CMap.tar.bz2
    +
  • Put the CMap directory into the pdfminer directory.
  • Go to the pdfminer directory.
  • Do the follwoing: (this is optional but highly recommended)
    @@ -90,13 +100,15 @@ $ make cdbcmap -
    + +

    Usage

    PDFMiner comes with two programs: pdf2txt.py and dumppdf.py. +

    pdf2txt.py

    pdf2txt.py extracts text contents from a PDF file. @@ -149,6 +161,7 @@ By default, it extracts texts from all the pages.

    Increases the debug level. +

    dumppdf.py

    dumppdf.py dumps the internal contents of a PDF file @@ -199,16 +212,18 @@ no stream header is displayed for the ease of saving it to a file.

    Increases the debug level. -
    + +

    Changes

    -
    + +

    Related Projects

    - -
    + +

    Terms and conditions

    @@ -245,6 +260,6 @@ OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -


    +
    Yusuke Shinyama