diff --git a/README.html b/README.html index ba82d77..e8487e6 100644 --- a/README.html +++ b/README.html @@ -1,17 +1,23 @@ +
+PDFMiner is a suite of programs that aims to help extracting or analyzing text data from PDF documents. @@ -49,7 +55,8 @@ http://www.unixuser.org/~euske/python/pdfminer/pdfminer-dist-20080427.tar.gz http://pdfminerr.googlecode.com/svn/ -
Prerequisite: Python 2.4 or newer. @@ -81,7 +88,10 @@ Here is how: http://www.unixuser.org/~euske/pub/CMap.tar.bz2 -
$ tar jxf CMap.tar.bz2
++$ tar jxf CMap.tar.bz2 +
CMap
directory into the pdfminer
directory.
pdfminer
directory.
PDFMiner comes with two programs:
pdf2txt.py
and dumppdf.py
.
+
pdf2txt.py
extracts text contents from a PDF file.
@@ -149,6 +161,7 @@ By default, it extracts texts from all the pages.
dumppdf.py
dumps the internal contents of a PDF file
@@ -199,16 +212,18 @@ no stream header is displayed for the ease of saving it to a file.
@@ -245,6 +260,6 @@ OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -