pdfminer.six/samples
wind_chh 234c466372
Fix extraction of some cjk characters (#593)
Fixes #566 

* try to fix issue of some Chinese characters cannot be extracted
correctly (#566).

* format code to pass flake8 check.

* fix typo and refer to issue 593.

Co-authored-by: huan_cheng <huan_cheng@bestsign.cn>
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
2021-08-26 21:05:03 +02:00
..
acroform Add section to documentation with howto for AcroForm fields extraction (#458) 2020-09-10 19:18:41 +02:00
contrib Fix extraction of some cjk characters (#593) 2021-08-26 21:05:03 +02:00
encryption Change pycryptodome dependency to the faster, smaller, and industry standard cryptography package (#456) 2020-07-20 22:00:54 +02:00
nonfree Fix failing test on develop & cleaning up test files (#319) 2019-10-26 18:42:33 +02:00
scancode Add a test for the previous fix 2017-10-16 12:35:16 +02:00
README Added: tests for extracting tests from pdfs with Type3 fonts (#205) 2019-10-22 18:15:59 +02:00
font-size-test.pdf Fix bug in computing character bounding box (#348) 2020-01-16 22:15:50 +01:00
jo.pdf add samples, fixed silly bugs. 2007-12-31 05:02:15 +00:00
sampleOneByteIdentityEncode.pdf Adds Test Case 2019-08-10 10:19:20 +05:30
simple1.pdf testcase added 2009-10-24 02:50:07 +00:00
simple2.pdf various cleanup for release. 2008-04-27 11:47:38 +00:00
simple3.pdf test file simple3.pdf added. 2010-08-29 06:39:41 +00:00
simple4.pdf Fix ordering of textlines within a textbox when boxes_flow is disabled (#412) 2020-05-09 15:37:49 +02:00

README

This directory contains sample PDF files.

These files (including ones in nonfree/ subdirectory) can be
distributed freely but does not come with explicit licensing 
terms or source files.

Here are the credits of the original files:

simple1.pdf:
  (Originally taken from PDF Specification 1.7, 
  Appendix G. "Simple Text String Example" and modified)

simple2.pdf:
  (Originally taken from PDF Specification 1.7, 
  Appendix G. "Simple Graphics Example" and modified)

jo.pdf:
  Kenji Miyazawa (1896-1933, copyright expired)
  Preface of "Haru to Shura"
  (File generated from jo.tex by LaTeX and dvi2pdfm)

--
contrib/matplotlib.pdf
  Copyright 2018, James R Barlow
  Example file created in matplotlib to add a Type3 font to the samples
  Released under the terms of the "LICENSE" file

--
nonfree/cmp_itext_logo.pdf
  Bruno Lowagie
  "iText Logo - Type 3 font"
  http://gitlab.itextsupport.com/itext/sandbox/raw/master/cmpfiles/fonts/cmp_itext_logo.pdf

nonfree/dmca.pdf: 
  U.S. Copyright Office
  The Digital Millenium Copyright Act
  http://www.copyright.gov/legislation/dmca.pdf

nonfree/f1040nr.pdf:
  U.S. Department of the Treasury Internal Revenue Service
  Form 1040-NR, U.S. Nonresident Alien Income Tax Return
  http://www.irs.gov/pub/irs-pdf/f1040nr.pdf

nonfree/i1040nr.pdf:
  U.S. Department of the Treasury Internal Revenue Service
  Instructions for Form 1040-NR, U.S. Nonresident Alien Income Tax Return
  http://www.irs.gov/pub/irs-pdf/i1040nr.pdf

nonfree/kampo.pdf:
  National Priting Bureau of Japan
  Official Gazette, Vol. 4817
  http://kanpou.npb.go.jp/

nonfree/nlp2004slides.pdf:
  Yusuke Shinyama and Satoshi Sekine
  "Named Entity Discovery from Comparable News Corpora"

nonfree/naacl06-shinyama.pdf:
  Yusuke Shinyama and Satoshi Sekine
  "Preemptive Information Extraction using Unrestircted Relation Discovery"

--
Files in the encryption folder have been generated with cpdf 1.7 [http://www.coherentpdf.com/]
from the base.pdf file generated with LibreOffice 4.1.1.2 as follows:

cpdf -encrypt 40bit foo baz base.pdf -o rc4-40.pdf
cpdf -encrypt 128bit foo baz base.pdf -o rc4-128.pdf
cpdf -encrypt AES foo baz base.pdf -o aes-128.pdf
cpdf -encrypt AES foo baz base.pdf -no-encrypt-metadata -o aes-128-m.pdf
cpdf -encrypt AES256 foo baz base.pdf -o aes-256.pdf
cpdf -encrypt AES256 foo baz base.pdf -no-encrypt-metadata -o aes-256-m.pdf