add regression tests.

git-svn-id: https://pdfminerr.googlecode.com/svn/trunk/pdfminer@189 1aa58f4a-7d42-0410-adbc-911cccaed67c
pull/1/head
yusuke.shinyama.dummy 2010-03-22 04:34:52 +00:00
parent cd39642abe
commit fa13122f09
30 changed files with 60026 additions and 14 deletions

View File

@ -19,7 +19,7 @@ Python PDF parser and analyzer
<div align=right class=lastmod> <div align=right class=lastmod>
<!-- hhmts start --> <!-- hhmts start -->
Last Modified: Sat Mar 20 05:43:04 UTC 2010 Last Modified: Mon Mar 22 04:34:28 UTC 2010
<!-- hhmts end --> <!-- hhmts end -->
</div> </div>
@ -334,7 +334,6 @@ no stream header is displayed for the ease of saving it to a file.
<hr noshade> <hr noshade>
<h2>TODOs</h2> <h2>TODOs</h2>
<ul> <ul>
<li> Automated testing.
<li> <A href="http://www.python.org/dev/peps/pep-0008/">PEP-8</a> and <li> <A href="http://www.python.org/dev/peps/pep-0008/">PEP-8</a> and
<a href="http://www.python.org/dev/peps/pep-0257/">PEP-257</a> conformance. <a href="http://www.python.org/dev/peps/pep-0257/">PEP-257</a> conformance.
<li> Better text extraction / layout analysis. <li> Better text extraction / layout analysis.
@ -348,7 +347,7 @@ no stream header is displayed for the ease of saving it to a file.
<hr noshade> <hr noshade>
<h2>Changes</h2> <h2>Changes</h2>
<ul> <ul>
<li> 2010/03/xx: Improved layout analysis. <li> 2010/03/22: Improved layout analysis. Added regression tests.
<li> 2010/03/12: A couple of bugfixes. Thanks to Sean Manefield. <li> 2010/03/12: A couple of bugfixes. Thanks to Sean Manefield.
<li> 2010/02/27: Changed the way of internal layout handling. (LTTextItem -&gt; LTChar) <li> 2010/02/27: Changed the way of internal layout handling. (LTTextItem -&gt; LTChar)
<li> 2010/02/15: Several bugfixes. Thanks to Sean. <li> 2010/02/15: Several bugfixes. Thanks to Sean.

View File

@ -488,18 +488,18 @@ def group_boxes(groupfunc, objs, distfunc):
while 2 <= len(objs): while 2 <= len(objs):
mindist = INF mindist = INF
minpair = None minpair = None
objs.sort(key=lambda obj: obj.width*obj.height) objs.sort(key=lambda obj: (obj.width*obj.height, obj.y0))
for (i,obj0) in enumerate(objs): for i in xrange(len(objs)):
for obj1 in objs[i+1:]: for j in xrange(i+1, len(objs)):
d = distfunc(obj0, obj1) d = distfunc(objs[i], objs[j])
if d < mindist: if d < mindist:
mindist = d mindist = d
minpair = (obj0, obj1) minpair = (objs[i], objs[j])
assert minpair assert minpair
(obj0, obj1) = minpair (obj1, obj2) = minpair
objs.remove(obj0)
objs.remove(obj1) objs.remove(obj1)
objs.append(groupfunc([obj0, obj1])) objs.remove(obj2)
objs.append(groupfunc([obj1, obj2]))
assert len(objs) == 1 assert len(objs) == 1
return objs.pop() return objs.pop()

View File

@ -53,12 +53,12 @@ xmls: $(XMLS)
.pdf.html: .pdf.html:
$(PDF2TXT) -t html $< > $@ $(PDF2TXT) -t html $< > $@
# $(CMP) $@ $@.ref $(CMP) $@ $@.ref
.pdf.xml: .pdf.xml:
$(PDF2TXT) -t xml $< > $@ $(PDF2TXT) -t xml $< > $@
# $(CMP) $@ $@.ref $(CMP) $@ $@.ref
.pdf.txt: .pdf.txt:
$(PDF2TXT) -t text $< > $@ $(PDF2TXT) -t text $< > $@
# $(CMP) $@ $@.ref $(CMP) $@ $@.ref

1953
samples/dmca.html.ref Normal file

File diff suppressed because it is too large Load Diff

59
samples/dmca.txt.ref Normal file
View File

@ -0,0 +1,59 @@
THE DIGITAL MILLENNIUM COPYRIGHT ACT OF 1998
U.S. Copyright Office Summary
INTRODUCTION
December 1998
The Digital Millennium Copyright Act (DMCA) was signed into law by
1
President Clinton on October 28, 1998. The legislation implements two 1996 World
Intellectual Property Organization (WIPO) treaties: the WIPO Copyright Treaty and
the WIPO Performances and Phonograms Treaty. The DMCA also addresses a
number of other significant copyright-related issues.
The DMCA is divided into five titles:
!
!
!
!
!
Title I, the “WIPO Copyright and Performances and Phonograms
Treaties Implementation Act of 1998,” implements the WIPO
treaties.
Title II, the “Online Copyright Infringement Liability Limitation
Act,” creates limitations on the liability of online service providers for
copyright infringement when engaging in certain types of activities.
Title III, the “Computer Maintenance Competition Assurance
Act,” creates an exemption for making a copy of a computer program
by activating a computer for purposes of maintenance or repair.
Title IV contains six miscellaneous provisions, relating to the
functions of the Copyright Office, distance education, the exceptions
in the Copyright Act for libraries and for making ephemeral recordings,
“webcasting” of sound recordings on the Internet, and the applicability
of collective bargaining agreement obligations in the case of transfers
of rights in motion pictures.
Title V, the “Vessel Hull Design Protection Act,” creates a new form
of protection for the design of vessel hulls.
This memorandum summarizes briefly each title of the DMCA. It provides
merely an overview of the laws provisions; for purposes of length and readability a
significant amount of detail has been omitted. A complete understanding of any
provision of the DMCA requires reference to the text of the legislation itself.
Copyright Office Summary
December 1998
Page 1
Pub. L. No. 105-304, 112 Stat. 2860 (Oct. 28, 1998).
1

2230
samples/dmca.xml.ref Normal file

File diff suppressed because it is too large Load Diff

4268
samples/f1040nr.html.ref Normal file

File diff suppressed because it is too large Load Diff

322
samples/f1040nr.txt.ref Normal file
View File

@ -0,0 +1,322 @@
1040NR
Form
Department of the Treasury
beginning
Internal Revenue Service
Your first name and initial
U.S. Nonresident Alien Income Tax Return
For the year January 1December 31, 2007, or other tax year
, 2007, and ending
Last name
Present home address (number, street, and apt. no., or rural route). If you have a P.O. box, see page 8.
City, town or post office, state, and ZIP code. If you have a foreign address, see page 8.
OMB No. 1545-0074
2007
, 20
Identifying number (see page 8)
Check if:
Individual
Estate or Trust
Type of entry visa (see page 8)
'
Of what country were you a citizen or national during the tax year? '
Country '
Give address outside the United States to which you want any
Give address in the country where you are a permanent resident.
refund check mailed. If same as above, write “Same.”
If same as above, write “Same.”
Filing Status and Exemptions for Individuals (see page 8)
Filing status. Check only one box (16 below).
1
Single resident of Canada or Mexico, or a single U.S. national
2
Other single nonresident alien
3
Married resident of Canada or Mexico, or a married U.S. national
4
Married resident of the Republic of Korea (South Korea)
5
Other married nonresident alien
6
Qualifying widow(er) with dependent child (see page 9)
Caution: Do not check box 7a if your parent (or someone else) can claim you as a dependent.
Do not check box 7b if your spouse had any U.S. gross income.
7 c
Dependents: (see page 9)
%
(1) First name
Last name
(3) Dependents
relationship
to you
(4)
if qualifying
child for child tax
credit (see page 9)
If you check box 7b, enter your spouses
identifying number '
$
7 a
Yourself
7 b
Spouse
(2) Dependents
identifying number
...
...
...
...
...
...
...
...
No. of boxes checked
on 7a and 7b
No. of children on
7c who:
c lived with you
c did not live with
you due to divorce
or separation
Dependents on 7c
not entered above
Add numbers entered
on lines above
8
9 a
'
'
'
'
'
Total number of exemptions claimed
Wages, salaries, tips, etc. Attach Form(s) W-2
Taxable interest
Tax-exempt interest. Do not include on line 9a
Ordinary dividends
1 0b
Qualified dividends (see page 11)
1 1
Taxable refunds, credits, or offsets of state and local income taxes (see page 11)
1 2
Scholarship and fellowship grants. Attach Form(s) 1042-S or required statement (see page 11)
1 3
Business income or (loss). Attach Schedule C or C-EZ (Form 1040)
1 4
Capital gain or (loss). Attach Schedule D (Form 1040) if required. If not required, check here
1 5
Other gains or (losses). Attach Form 4797
16a
1 6b
16a
Taxable amount (see page 12)
IRA distributions
17a
1 7b
17a
Taxable amount (see page 13)
Pensions and annuities
1 8
Rental real estate, royalties, partnerships, trusts, etc. Attach Schedule E (Form 1040)
1 9
Farm income or (loss). Attach Schedule F (Form 1040)
2 0
Unemployment compensation
2 1
Other income. List type and amount (see page 15)
2 2
2 2
Total income exempt by a treaty from page 5, Item M
2 3
Add lines 8, 9a, 10a, 1115, 16b, and 17b21. This is your total effectively connected income '
2 4
2 4
Educator expenses (see page 15)
2 5
2 5
Health savings account deduction. Attach Form 8889
2 6
2 6
Moving expenses. Attach Form 3903
2 7
2 7
Self-employed SEP, SIMPLE, and qualified plans
2 8
2 8
Self-employed health insurance deduction (see page 16)
2 9
2 9
Penalty on early withdrawal of savings
3 0
3 0
Scholarship and fellowship grants excluded
3 1
IRA deduction (see page 16)
3 1
3 2
3 2
Student loan interest deduction (see page 16)
3 3
3 3
Domestic production activities deduction. Attach Form 8903
3 4
Add lines 24 through 33
3 5
Subtract line 34 from line 23. Enter here and on line 36. This is your adjusted gross income '
For Disclosure, Privacy Act, and Paperwork Reduction Act Notice, see page 32.
Cat. No. 11364D
8
9 a
b
10a
b
9 b
d
1 0a
1 1
1 2
1 3
1 4
1 5
1 6b
1 7b
1 8
1 9
2 0
2 1
2 3
3 4
3 5
Form 1040NR (2007)
Please print or type. Attach Forms W-2 here.Also attach Form(s) 1099-R if tax was withheld. Income Effectively Connected With U.S. Trade/Business Enclose, but do not attach, any payment. Adjusted Gross Income

6011
samples/f1040nr.xml.ref Normal file

File diff suppressed because it is too large Load Diff

5506
samples/i1040nr.html.ref Normal file

File diff suppressed because it is too large Load Diff

186
samples/i1040nr.txt.ref Normal file
View File

@ -0,0 +1,186 @@
PAGER/SGML
Page 1 of 48
Leadpct: 0% Pt. size: 9.5 ❏ Draft
Userid: ________ DTD INSTR04
(Init. & date)
Fileid:
D:\USERS\8fllb\documents\epicfiles\2007Instructions1040NR.sgm
7:48 - 6-DEC-2007
Instructions for Form 1040NR
❏ Ok to Print
The type and rule above prints on all proofs including departmental reproduction proofs. MUST be removed before printing.
2007
Instructions for
Form 1040NR
U.S. Nonresident Alien Income Tax Return
Department of the Treasury
Internal Revenue Service
use a different address this year. See
Section references are to the Internal
Where To File on page 4.
Revenue Code unless otherwise noted.
General Instructions deduction. The deduction rate for
Domestic production activities
You may be able to deduct up to an
2007 is increased to 6%.
additional $3,000 if you were a
Whats New for 2007
Unreported social security and
participant in a 401(k) plan and your
Medicare tax on wages. If you are
employer was in bankruptcy in an
Tax benefits extended. The following
an employee and your employer did not
earlier year.
tax benefits were extended through
withhold social security and Medicare
2007.
Personal exemption and itemized
• Deduction for educator expenses in
tax, see Form 8919 to figure and report
deduction phaseouts reduced.
this tax.
figuring adjusted gross income.
Taxpayers with adjusted gross income
• District of Columbia first-time
Refundable credit for prior-year
above a certain amount may lose part
minimum tax. If you have an unused
homebuyer credit.
of their deduction for personal
minimum tax credit carryforward from
Alternative minimum tax (AMT)
exemptions and itemized deductions.
2004, see Form 8801 to find if you can
exemption amount decreased. The
The amount by which these deductions
take this credit.
AMT exemption amount is decreased to
are reduced in 2008 will be only 1/2 of
Health savings account (HSA)
$33,750 ($45,000 if a qualifying
the amount of the reduction that
funding distributions. You may be
widow(er); $22,500 if married filing
otherwise would have applied in 2007.
able to elect to exclude from income a
separately).
Capital gain tax rate reduced. The
distribution made from your IRA to your
At the time these instructions
5% capital gain tax rate is reduced to
HSA. See the instructions for lines 16a
!
went to print, Congress was
zero.
and 16b beginning on page 12.
considering legislation that
CAUTION
New recordkeeping requirements for
Tax on childrens income. Form
would increase the amounts above. To
contributions of money. For
8615 will be required to figure the tax
find out if this legislation was enacted,
charitable contributions of money,
for the following children with
and for more details, see the
regardless of the amount, you must
investment income of more than
Instructions for Form 6251.
maintain as a record of the contribution $1,800.
IRA deduction expanded. If you were a bank record (such as a cancelled
1. Children under age 18 at the end
covered by a retirement plan, you may
check) or a written record from the
of 2008.
be able to take an IRA deduction if your
charity. The written record must include
2. The following children if their
2007 modified adjusted gross income
the name of the charity, date, and
earned income is not more than half
(AGI) is less than $62,000 ($103,000 if
amount of the contribution. See Gifts to
their support.
a qualifying widow(er)).
U.S. Charities that begins on page 26.
a. Children age 18 at the end of
You may be able to deduct up to an
Exemption for housing a person
2008.
additional $3,000 if you were a
displaced by Hurricane Katrina
b. Children over age 18 and under
participant in a 401(k) plan and your
expires. The additional exemption
age 24 at the end of 2008 who are
employer was in bankruptcy in an
amount for housing a person displaced
full-time students.
earlier year.
by Hurricane Katrina does not apply for
2007 or later years.
The election to report a childs
Standard mileage rates. The 2007
investment income on a parents return
Telephone excise tax credit. This
rate for business use of your vehicle is
and the special rule for when a child
481/2 cents a mile. The 2007 rate for
credit was available only on your 2006
must file Form 6251 will also apply to
use of your vehicle to move is 20 cents
return. If you filed but did not request it
the children listed above.
a mile. The special rate for charitable
on your 2006 return, file Form 1040X
use of your vehicle to provide relief
using a simplified procedure explained
Expiring tax benefits. The following
related to Hurricane Katrina has
in its instructions to amend your 2006
benefits are scheduled to expire and
expired.
return. If you were not required to file a
will not apply for 2008.
• Deduction for educator expenses in
2006 return, see the 2006 Form
Elective salary deferrals. The
1040EZ-T.
maximum amount you can defer under
figuring adjusted gross income.
• The exclusion from income of
all plans is generally limited to $15,500
Whats New for 2008
($10,500 if you only have SIMPLE
qualified charitable deductions.
• Credit for nonbusiness energy
plans; $18,500 for section 403(b) plans
IRA deduction expanded. You may
if you qualify for the 15-year rule). See
property.
be able to deduct up to $5,000 ($6,000
• District of Columbia first-time
the instructions for line 8 on page 10.
if age 50 or older at the end of the
Mailing your return. If you are filing
homebuyer credit (for homes
year). You may be able to take an IRA
purchased after 2007).
deduction if you were covered by a
the return for an estate or trust, you will
retirement plan and your 2008 modified
AGI is less than $63,000 ($105,000) if a
qualifying widow(er)).
Cat. No. 11368V

6273
samples/i1040nr.xml.ref Normal file

File diff suppressed because it is too large Load Diff

1034
samples/jo.html.ref Normal file

File diff suppressed because it is too large Load Diff

964
samples/jo.txt.ref Normal file
View File

@ -0,0 +1,964 @@
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?

4820
samples/jo.xml.ref Normal file

File diff suppressed because it is too large Load Diff

2853
samples/kampo.html.ref Normal file

File diff suppressed because it is too large Load Diff

2717
samples/kampo.txt.ref Normal file

File diff suppressed because it is too large Load Diff

13623
samples/kampo.xml.ref Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,87 @@
Preemptive Information Extraction using Unrestricted Relation Discovery
Yusuke Shinyama
Satoshi Sekine
New York University
715, Broadway, 7th Floor
New York, NY, 10003
{yusuke,sekine}@cs.nyu.edu
We are trying to extend the boundary of
Information Extraction (IE) systems. Ex-
isting IE systems require a lot of time and
human effort to tune for a new scenario.
Preemptive Information Extraction is an
attempt to automatically create all feasible
IE systems in advance without human in-
tervention. We propose a technique called
Unrestricted Relation Discovery that dis-
covers all possible relations from texts and
presents them as tables. We present a pre-
liminary system that obtains reasonably
good results.
Abstract
1 Background
Every day, a large number of news articles are cre-
ated and reported, many of which are unique. But
certain types of events, such as hurricanes or mur-
ders, are reported again and again throughout a year.
The goal of Information Extraction, or IE, is to re-
trieve a certain type of news event from past articles
and present the events as a table whose columns are
filled with a name of a person or company, accord-
ing to its role in the event. However, existing IE
techniques require a lot of human labor. First, you
have to specify the type of information you want and
collect articles that include this information. Then,
you have to analyze the articles and manually craft
a set of patterns to capture these events. Most exist-
ing IE research focuses on reducing this burden by
helping people create such patterns. But each time
you want to extract a different kind of information,
you need to repeat the whole process: specify arti-
cles and adjust its patterns, either manually or semi-
automatically. There is a bit of a dangerous pitfall
here. First, it is hard to estimate how good the sys-
tem can be after months of work. Furthermore, you
might not know if the task is even doable in the first
place. Knowing what kind of information is easily
obtained in advance would help reduce this risk.
An IE task can be defined as finding a relation
among several entities involved in a certain type of
For example, in the MUC-6 management
event.
succession scenario, one seeks a relation between
COMPANY, PERSON and POST involved with hir-
ing/firing events. For each row of an extracted ta-
ble, you can always read it as “COMPANY hired
(or fired) PERSON for POST.” The relation between
these entities is retained throughout the table. There
are many existing works on obtaining extraction pat-
terns for pre-defined relations (Riloff, 1996; Yangar-
ber et al., 2000; Agichtein and Gravano, 2000; Sudo
et al., 2003).
Unrestricted Relation Discovery is a technique to
automatically discover such relations that repeatedly
appear in a corpus and present them as a table, with
absolutely no human intervention. Unlike most ex-
isting IE research, a user does not specify the type
of articles or information wanted. Instead, a system
tries to find all the kinds of relations that are reported
multiple times and can be reported in tabular form.
This technique will open up the possibility of try-
ing new IE scenarios. Furthermore, the system itself
can be used as an IE system, since an obtained re-
lation is already presented as a table. If this system
works to a certain extent, tuning an IE system be-
comes a search problem: all the tables are already
built “preemptively.” A user only needs to search
for a relevant table.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,89 @@
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head><body>
<span style="position:absolute; border: gray 1px solid; left:0px; top:50px; width:800px; height:600px;"></span>
<div style="position:absolute; top:50px;"><a name="1">Page 1</a></div>
<span style="position:absolute; border: blue 1px solid; left:62px; top:126px; width:672px; height:157px;"></span>
<span style="position:absolute; left:62px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:110px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:158px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:206px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:254px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:302px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:350px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:398px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:446px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:494px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:542px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:590px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:638px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:686px; top:126px; font-size:85px;">?</span>
<span style="position:absolute; left:62px; top:198px; font-size:85px;">?</span>
<span style="position:absolute; left:110px; top:198px; font-size:85px;">?</span>
<span style="position:absolute; left:158px; top:198px; font-size:85px;">?</span>
<span style="position:absolute; left:206px; top:198px; font-size:85px;">?</span>
<span style="position:absolute; left:254px; top:198px; font-size:85px;">?</span>
<span style="position:absolute; left:302px; top:198px; font-size:85px;">?</span>
<span style="position:absolute; left:350px; top:198px; font-size:85px;">?</span>
<span style="position:absolute; border: blue 1px solid; left:263px; top:374px; width:468px; height:212px;"></span>
<span style="position:absolute; left:576px; top:374px; font-size:64px;">?</span>
<span style="position:absolute; left:612px; top:374px; font-size:64px;">?</span>
<span style="position:absolute; left:648px; top:374px; font-size:64px;">?</span>
<span style="position:absolute; left:660px; top:374px; font-size:64px;">?</span>
<span style="position:absolute; left:696px; top:374px; font-size:64px;">?</span>
<span style="position:absolute; left:612px; top:430px; font-size:64px;">?</span>
<span style="position:absolute; left:648px; top:430px; font-size:64px;">?</span>
<span style="position:absolute; left:684px; top:430px; font-size:64px;">?</span>
<span style="position:absolute; left:696px; top:430px; font-size:64px;">?</span>
<span style="position:absolute; left:263px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:285px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:304px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:332px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:352px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:371px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:383px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:401px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:415px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:424px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:444px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:462px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:469px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:487px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:506px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:523px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:541px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:550px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:573px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:591px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:611px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:628px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:642px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:654px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:683px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:701px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:719px; top:493px; font-size:50px;">?</span>
<span style="position:absolute; left:424px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:447px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:464px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:488px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:498px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:519px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:538px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:551px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:570px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:579px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:602px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:621px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:629px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:646px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:664px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:678px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:694px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:702px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; left:714px; top:537px; font-size:50px;">?</span>
<span style="position:absolute; border: black 1px solid; left:0px; top:50px; width:800px; height:600px;"></span>
<span style="position:absolute; border: black 1px solid; left:50px; top:308px; width:510px; height:0px;"></span>
<span style="position:absolute; border: green 1px solid; left:25px; top:587px; width:41px; height:40px;"></span>
<span style="position:absolute; border: red 1px solid; left:62px; top:126px; width:672px; height:460px;"></span>
<div style="position:absolute; top:0px;">Page: <a href="#1">1</a></div>
</body></html>

View File

@ -0,0 +1,9 @@
??????????????
???????
?????
????
???????????????????????????
???????????????????

View File

@ -0,0 +1,120 @@
<?xml version="1.0" encoding="utf-8" ?>
<pages>
<page id="1" bbox="0.000,0.000,800.000,600.000" rotate="0">
<textbox id="0" bbox="62.000,365.240,734.000,523.160">
<textline bbox="62.000,437.240,734.000,523.160">
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="62.000,437.240,110.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="110.000,437.240,158.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="158.000,437.240,206.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="206.000,437.240,254.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="254.000,437.240,302.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="302.000,437.240,350.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="350.000,437.240,398.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="398.000,437.240,446.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="446.000,437.240,494.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="494.000,437.240,542.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="542.000,437.240,590.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="590.000,437.240,638.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="638.000,437.240,686.000,523.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="686.000,437.240,734.000,523.160" size="85.920">?</text>
<text>
</text>
</textline>
<textline bbox="62.000,365.240,398.000,451.160">
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="62.000,365.240,110.000,451.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="110.000,365.240,158.000,451.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="158.000,365.240,206.000,451.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="206.000,365.240,254.000,451.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="254.000,365.240,302.000,451.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="302.000,365.240,350.000,451.160" size="85.920">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="350.000,365.240,398.000,451.160" size="85.920">?</text>
<text>
</text>
</textline>
</textbox>
<textbox id="1" bbox="263.532,62.640,732.000,275.120">
<textline bbox="576.012,210.680,732.000,275.120">
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="576.012,210.680,612.012,275.120" size="64.440">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="612.012,210.680,648.012,275.120" size="64.440">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="648.012,210.680,660.000,275.120" size="64.440">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="660.000,210.680,696.000,275.120" size="64.440">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="696.000,210.680,732.000,275.120" size="64.440">?</text>
<text>
</text>
</textline>
<textline bbox="612.012,154.680,732.000,219.120">
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="612.012,154.680,648.012,219.120" size="64.440">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="648.012,154.680,684.012,219.120" size="64.440">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="684.012,154.680,696.000,219.120" size="64.440">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="696.000,154.680,732.000,219.120" size="64.440">?</text>
<text>
</text>
</textline>
<textline bbox="263.532,106.640,732.000,156.760">
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="263.532,106.640,285.736,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="285.736,106.640,304.496,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="304.496,106.640,332.776,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="332.776,106.640,352.600,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="352.600,106.640,371.444,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="371.444,106.640,383.596,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="383.596,106.640,401.432,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="401.432,106.640,415.208,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="415.208,106.640,424.532,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="424.532,106.640,444.552,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="444.552,106.640,462.052,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="462.052,106.640,469.640,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="469.640,106.640,487.476,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="487.476,106.640,506.320,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="506.320,106.640,523.820,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="523.820,106.640,541.656,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="541.656,106.640,550.980,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="550.980,106.640,573.548,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="573.548,106.640,591.384,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="591.384,106.640,611.208,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="611.208,106.640,628.960,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="628.960,106.640,642.736,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="642.736,106.640,654.888,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="654.888,106.640,683.168,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="683.168,106.640,701.004,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="701.004,106.640,719.848,156.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="719.848,106.640,732.000,156.760" size="50.120">?</text>
<text>
</text>
</textline>
<textline bbox="424.140,62.640,732.000,112.760">
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="424.140,62.640,447.128,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="447.128,62.640,464.964,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="464.964,62.640,488.764,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="488.764,62.640,498.088,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="498.088,62.640,519.312,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="519.312,62.640,538.072,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="538.072,62.640,551.848,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="551.848,62.640,570.104,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="570.104,62.640,579.428,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="579.428,62.640,602.640,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="602.640,62.640,621.484,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="621.484,62.640,629.072,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="629.072,62.640,646.460,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="646.460,62.640,664.296,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="664.296,62.640,678.072,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="678.072,62.640,694.676,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="694.676,62.640,702.264,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="702.264,62.640,714.416,112.760" size="50.120">?</text>
<text font="DAFPJF+HiraKakuPro-W6" vertical="False" bbox="714.416,62.640,732.000,112.760" size="50.120">?</text>
<text>
</text>
</textline>
</textbox>
<rect linewidth="0" bbox="0.000,0.000,800.000,600.000" />
<line linewidth="8" bbox="50.000,342.000,560.000,342.000" />
<figure name="Im1" bbox="25.000,23.000,66.000,63.000">
<image type="/FlateDecode" width="41" height="40" />
</figure>
</page>
<layout>
<textgroup bbox="62.000,62.640,734.000,523.160">
<textbox id="0" bbox="62.000,365.240,734.000,523.160" />
<textbox id="1" bbox="263.532,62.640,732.000,275.120" />
</textgroup>
</layout>
</pages>

66
samples/simple1.html.ref Normal file
View File

@ -0,0 +1,66 @@
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head><body>
<span style="position:absolute; border: gray 1px solid; left:0px; top:50px; width:612px; height:792px;"></span>
<div style="position:absolute; top:50px;"><a name="1">Page 1</a></div>
<span style="position:absolute; border: blue 1px solid; left:100px; top:119px; width:61px; height:27px;"></span>
<span style="position:absolute; left:100px; top:119px; font-size:27px;">H</span>
<span style="position:absolute; left:117px; top:119px; font-size:27px;">e</span>
<span style="position:absolute; left:130px; top:119px; font-size:27px;">l</span>
<span style="position:absolute; left:136px; top:119px; font-size:27px;">l</span>
<span style="position:absolute; left:141px; top:119px; font-size:27px;">o</span>
<span style="position:absolute; left:154px; top:119px; font-size:27px;"> </span>
<span style="position:absolute; border: blue 1px solid; left:261px; top:119px; width:62px; height:27px;"></span>
<span style="position:absolute; left:261px; top:119px; font-size:27px;">W</span>
<span style="position:absolute; left:283px; top:119px; font-size:27px;">o</span>
<span style="position:absolute; left:297px; top:119px; font-size:27px;">r</span>
<span style="position:absolute; left:305px; top:119px; font-size:27px;">l</span>
<span style="position:absolute; left:310px; top:119px; font-size:27px;">d</span>
<span style="position:absolute; border: blue 1px solid; left:100px; top:219px; width:61px; height:27px;"></span>
<span style="position:absolute; left:100px; top:219px; font-size:27px;">H</span>
<span style="position:absolute; left:117px; top:219px; font-size:27px;">e</span>
<span style="position:absolute; left:130px; top:219px; font-size:27px;">l</span>
<span style="position:absolute; left:136px; top:219px; font-size:27px;">l</span>
<span style="position:absolute; left:141px; top:219px; font-size:27px;">o</span>
<span style="position:absolute; left:154px; top:219px; font-size:27px;"> </span>
<span style="position:absolute; border: blue 1px solid; left:261px; top:219px; width:62px; height:27px;"></span>
<span style="position:absolute; left:261px; top:219px; font-size:27px;">W</span>
<span style="position:absolute; left:284px; top:219px; font-size:27px;">o</span>
<span style="position:absolute; left:297px; top:219px; font-size:27px;">r</span>
<span style="position:absolute; left:305px; top:219px; font-size:27px;">l</span>
<span style="position:absolute; left:310px; top:219px; font-size:27px;">d</span>
<span style="position:absolute; border: blue 1px solid; left:100px; top:319px; width:111px; height:27px;"></span>
<span style="position:absolute; left:100px; top:319px; font-size:27px;">H</span>
<span style="position:absolute; left:127px; top:319px; font-size:27px;">e</span>
<span style="position:absolute; left:150px; top:319px; font-size:27px;">l</span>
<span style="position:absolute; left:166px; top:319px; font-size:27px;">l</span>
<span style="position:absolute; left:181px; top:319px; font-size:27px;">o</span>
<span style="position:absolute; left:204px; top:319px; font-size:27px;"> </span>
<span style="position:absolute; border: blue 1px solid; left:321px; top:319px; width:102px; height:27px;"></span>
<span style="position:absolute; left:321px; top:319px; font-size:27px;">W</span>
<span style="position:absolute; left:354px; top:319px; font-size:27px;">o</span>
<span style="position:absolute; left:377px; top:319px; font-size:27px;">r</span>
<span style="position:absolute; left:395px; top:319px; font-size:27px;">l</span>
<span style="position:absolute; left:410px; top:319px; font-size:27px;">d</span>
<span style="position:absolute; border: blue 1px solid; left:100px; top:419px; width:111px; height:27px;"></span>
<span style="position:absolute; left:100px; top:419px; font-size:27px;">H</span>
<span style="position:absolute; left:127px; top:419px; font-size:27px;">e</span>
<span style="position:absolute; left:150px; top:419px; font-size:27px;">l</span>
<span style="position:absolute; left:165px; top:419px; font-size:27px;">l</span>
<span style="position:absolute; left:181px; top:419px; font-size:27px;">o</span>
<span style="position:absolute; left:204px; top:419px; font-size:27px;"> </span>
<span style="position:absolute; border: blue 1px solid; left:321px; top:419px; width:102px; height:27px;"></span>
<span style="position:absolute; left:321px; top:419px; font-size:27px;">W</span>
<span style="position:absolute; left:353px; top:419px; font-size:27px;">o</span>
<span style="position:absolute; left:377px; top:419px; font-size:27px;">r</span>
<span style="position:absolute; left:395px; top:419px; font-size:27px;">l</span>
<span style="position:absolute; left:410px; top:419px; font-size:27px;">d</span>
<span style="position:absolute; border: red 1px solid; left:100px; top:119px; width:324px; height:327px;"></span>
<span style="position:absolute; border: red 1px solid; left:100px; top:119px; width:224px; height:127px;"></span>
<span style="position:absolute; border: red 1px solid; left:100px; top:119px; width:223px; height:27px;"></span>
<span style="position:absolute; border: red 1px solid; left:100px; top:219px; width:224px; height:27px;"></span>
<span style="position:absolute; border: red 1px solid; left:100px; top:319px; width:324px; height:127px;"></span>
<span style="position:absolute; border: red 1px solid; left:100px; top:319px; width:324px; height:27px;"></span>
<span style="position:absolute; border: red 1px solid; left:100px; top:419px; width:323px; height:27px;"></span>
<div style="position:absolute; top:0px;">Page: <a href="#1">1</a></div>
</body></html>

17
samples/simple1.txt.ref Normal file
View File

@ -0,0 +1,17 @@
Hello
World
Hello
World
H e l l o
W o r l d
H e l l o
W o r l d

139
samples/simple1.xml.ref Normal file
View File

@ -0,0 +1,139 @@
<?xml version="1.0" encoding="utf-8" ?>
<pages>
<page id="1" bbox="0.000,0.000,612.000,792.000" rotate="0">
<textbox id="0" bbox="100.000,695.032,161.344,722.776">
<textline bbox="100.000,695.032,161.344,722.776">
<text font="Helvetica" vertical="False" bbox="100.000,695.032,117.328,722.776" size="27.744">H</text>
<text font="Helvetica" vertical="False" bbox="117.328,695.032,130.672,722.776" size="27.744">e</text>
<text font="Helvetica" vertical="False" bbox="130.672,695.032,136.000,722.776" size="27.744">l</text>
<text font="Helvetica" vertical="False" bbox="136.000,695.032,141.328,722.776" size="27.744">l</text>
<text font="Helvetica" vertical="False" bbox="141.328,695.032,154.672,722.776" size="27.744">o</text>
<text font="Helvetica" vertical="False" bbox="154.672,695.032,161.344,722.776" size="27.744"> </text>
<text>
</text>
</textline>
</textbox>
<textbox id="1" bbox="261.328,695.032,323.992,722.776">
<textline bbox="261.328,695.032,323.992,722.776">
<text font="Helvetica" vertical="False" bbox="261.328,695.032,283.984,722.776" size="27.744">W</text>
<text font="Helvetica" vertical="False" bbox="283.984,695.032,297.328,722.776" size="27.744">o</text>
<text font="Helvetica" vertical="False" bbox="297.328,695.032,305.320,722.776" size="27.744">r</text>
<text font="Helvetica" vertical="False" bbox="305.320,695.032,310.648,722.776" size="27.744">l</text>
<text font="Helvetica" vertical="False" bbox="310.648,695.032,323.992,722.776" size="27.744">d</text>
<text>
</text>
</textline>
</textbox>
<textbox id="2" bbox="100.000,595.032,161.344,622.776">
<textline bbox="100.000,595.032,161.344,622.776">
<text font="Helvetica" vertical="False" bbox="100.000,595.032,117.328,622.776" size="27.744">H</text>
<text font="Helvetica" vertical="False" bbox="117.328,595.032,130.672,622.776" size="27.744">e</text>
<text font="Helvetica" vertical="False" bbox="130.672,595.032,136.000,622.776" size="27.744">l</text>
<text font="Helvetica" vertical="False" bbox="136.000,595.032,141.328,622.776" size="27.744">l</text>
<text font="Helvetica" vertical="False" bbox="141.328,595.032,154.672,622.776" size="27.744">o</text>
<text font="Helvetica" vertical="False" bbox="154.672,595.032,161.344,622.776" size="27.744"> </text>
<text>
</text>
</textline>
</textbox>
<textbox id="3" bbox="261.344,595.032,324.008,622.776">
<textline bbox="261.344,595.032,324.008,622.776">
<text font="Helvetica" vertical="False" bbox="261.344,595.032,284.000,622.776" size="27.744">W</text>
<text font="Helvetica" vertical="False" bbox="284.000,595.032,297.344,622.776" size="27.744">o</text>
<text font="Helvetica" vertical="False" bbox="297.344,595.032,305.336,622.776" size="27.744">r</text>
<text font="Helvetica" vertical="False" bbox="305.336,595.032,310.664,622.776" size="27.744">l</text>
<text font="Helvetica" vertical="False" bbox="310.664,595.032,324.008,622.776" size="27.744">d</text>
<text>
</text>
</textline>
</textbox>
<textbox id="4" bbox="100.000,495.032,211.344,522.776">
<textline bbox="100.000,495.032,211.344,522.776">
<text font="Helvetica" vertical="False" bbox="100.000,495.032,117.328,522.776" size="27.744">H</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="127.328,495.032,140.672,522.776" size="27.744">e</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="150.672,495.032,156.000,522.776" size="27.744">l</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="166.000,495.032,171.328,522.776" size="27.744">l</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="181.328,495.032,194.672,522.776" size="27.744">o</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="204.672,495.032,211.344,522.776" size="27.744"> </text>
<text>
</text>
</textline>
</textbox>
<textbox id="5" bbox="321.344,495.032,424.008,522.776">
<textline bbox="321.344,495.032,424.008,522.776">
<text font="Helvetica" vertical="False" bbox="321.344,495.032,344.000,522.776" size="27.744">W</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="354.000,495.032,367.344,522.776" size="27.744">o</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="377.344,495.032,385.336,522.776" size="27.744">r</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="395.336,495.032,400.664,522.776" size="27.744">l</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="410.664,495.032,424.008,522.776" size="27.744">d</text>
<text>
</text>
</textline>
</textbox>
<textbox id="6" bbox="100.000,395.032,211.264,422.776">
<textline bbox="100.000,395.032,211.264,422.776">
<text font="Helvetica" vertical="False" bbox="100.000,395.032,117.328,422.776" size="27.744">H</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="127.312,395.032,140.656,422.776" size="27.744">e</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="150.640,395.032,155.968,422.776" size="27.744">l</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="165.952,395.032,171.280,422.776" size="27.744">l</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="181.264,395.032,194.608,422.776" size="27.744">o</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="204.592,395.032,211.264,422.776" size="27.744"> </text>
<text>
</text>
</textline>
</textbox>
<textbox id="7" bbox="321.232,395.032,423.832,422.776">
<textline bbox="321.232,395.032,423.832,422.776">
<text font="Helvetica" vertical="False" bbox="321.232,395.032,343.888,422.776" size="27.744">W</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="353.872,395.032,367.216,422.776" size="27.744">o</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="377.200,395.032,385.192,422.776" size="27.744">r</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="395.176,395.032,400.504,422.776" size="27.744">l</text>
<text> </text>
<text font="Helvetica" vertical="False" bbox="410.488,395.032,423.832,422.776" size="27.744">d</text>
<text>
</text>
</textline>
</textbox>
</page>
<layout>
<textgroup bbox="100.000,395.032,424.008,722.776">
<textgroup bbox="100.000,595.032,324.008,722.776">
<textgroup bbox="100.000,695.032,323.992,722.776">
<textbox id="0" bbox="100.000,695.032,161.344,722.776" />
<textbox id="1" bbox="261.328,695.032,323.992,722.776" />
</textgroup>
<textgroup bbox="100.000,595.032,324.008,622.776">
<textbox id="2" bbox="100.000,595.032,161.344,622.776" />
<textbox id="3" bbox="261.344,595.032,324.008,622.776" />
</textgroup>
</textgroup>
<textgroup bbox="100.000,395.032,424.008,522.776">
<textgroup bbox="100.000,495.032,424.008,522.776">
<textbox id="4" bbox="100.000,495.032,211.344,522.776" />
<textbox id="5" bbox="321.344,495.032,424.008,522.776" />
</textgroup>
<textgroup bbox="100.000,395.032,423.832,422.776">
<textbox id="6" bbox="100.000,395.032,211.264,422.776" />
<textbox id="7" bbox="321.232,395.032,423.832,422.776" />
</textgroup>
</textgroup>
</textgroup>
</layout>
</pages>

11
samples/simple2.html.ref Normal file
View File

@ -0,0 +1,11 @@
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head><body>
<span style="position:absolute; border: gray 1px solid; left:0px; top:50px; width:612px; height:792px;"></span>
<div style="position:absolute; top:50px;"><a name="1">Page 1</a></div>
<span style="position:absolute; border: black 1px solid; left:150px; top:492px; width:0px; height:100px;"></span>
<span style="position:absolute; border: black 1px solid; left:150px; top:592px; width:250px; height:0px;"></span>
<span style="position:absolute; border: black 1px solid; left:200px; top:467px; width:50px; height:75px;"></span>
<span style="position:absolute; border: black 1px solid; left:300px; top:442px; width:100px; height:100px;"></span>
<div style="position:absolute; top:0px;">Page: <a href="#1">1</a></div>
</body></html>

1
samples/simple2.txt.ref Normal file
View File

@ -0,0 +1 @@

9
samples/simple2.xml.ref Normal file
View File

@ -0,0 +1,9 @@
<?xml version="1.0" encoding="utf-8" ?>
<pages>
<page id="1" bbox="0.000,0.000,612.000,792.000" rotate="0">
<line linewidth="0" bbox="150.000,250.000,150.000,350.000" />
<line linewidth="4" bbox="150.000,250.000,400.000,250.000" />
<rect linewidth="1" bbox="200.000,300.000,250.000,375.000" />
<polygon linewidth="1" bbox="300.000,300.000,400.000,400.000" pts="300.000,300.000,300.000,400.000,400.000,400.000,400.000,300.000"/>
</page>
</pages>