pdfminer.six/samples/nonfree/naacl06-shinyama.html.ref

84 lines
7.6 KiB
Plaintext

<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head><body>
<span style="position:absolute; border: gray 1px solid; left:0px; top:50px; width:595px; height:842px;"></span>
<div style="position:absolute; top:50px;"><a name="1">Page 1</a></div>
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:79px; top:125px; width:452px; height:18px;"><span style="font-family: MZSZGI+NimbusRomNo9L-Medi; font-size:18px">Preemptive Information Extraction using Unrestricted Relation Discovery
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:148px; top:164px; width:91px; height:15px;"><span style="font-family: MZSZGI+NimbusRomNo9L-Medi; font-size:15px">Yusuke Shinyama
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:389px; top:164px; width:74px; height:15px;"><span style="font-family: MZSZGI+NimbusRomNo9L-Medi; font-size:15px">Satoshi Sekine
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:255px; top:187px; width:101px; height:14px;"><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:14px">New York University
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:244px; top:201px; width:122px; height:14px;"><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:14px">715, Broadway, 7th Floor
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:252px; top:215px; width:106px; height:14px;"><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:14px">New York, NY, 10003
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:222px; top:224px; width:168px; height:18px;"><span style="font-family: ZNQAHA+CMSY10; font-size:18px">{</span><span style="font-family: CXOZYQ+NimbusMonL-Regu; font-size:11px">yusuke,sekine</span><span style="font-family: ZNQAHA+CMSY10; font-size:18px">}</span><span style="font-family: CXOZYQ+NimbusMonL-Regu; font-size:11px">@cs.nyu.edu
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:163px; top:281px; width:44px; height:15px;"><span style="font-family: MZSZGI+NimbusRomNo9L-Medi; font-size:15px">Abstract
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:93px; top:310px; width:183px; height:175px;"><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">We are trying to extend the boundary of
<br>Information Extraction (IE) systems. Ex-
<br>isting IE systems require a lot of time and
<br>human effort to tune for a new scenario.
<br>Preemptive Information Extraction is an
<br></span><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">attempt to automatically create all feasible
<br></span><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">IE systems in advance without human in-
<br>tervention. We propose a technique called
<br>Unrestricted Relation Discovery that dis-
<br>covers all possible relations from texts and
<br>presents them as tables. We present a pre-
<br>liminary system that obtains reasonably
<br>good results.
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:71px; top:505px; width:80px; height:15px;"><span style="font-family: MZSZGI+NimbusRomNo9L-Medi; font-size:15px">1 Background
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:71px; top:528px; width:226px; height:243px;"><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">Every day, a large number of news articles are cre-
<br>ated and reported, many of which are unique. But
<br>certain types of events, such as hurricanes or mur-
<br>ders, are reported again and again throughout a year.
<br>The goal of Information Extraction, or IE, is to re-
<br>trieve a certain type of news event from past articles
<br>and present the events as a table whose columns are
<br></span><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">filled with a name of a person or company, accord-
<br></span><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">ing to its role in the event. However, existing IE
<br>techniques require a lot of human labor. First, you
<br>have to specify the type of information you want and
<br>collect articles that include this information. Then,
<br>you have to analyze the articles and manually craft
<br>a set of patterns to capture these events. Most exist-
<br>ing IE research focuses on reducing this burden by
<br>helping people create such patterns. But each time
<br>you want to extract a different kind of information,
<br>you need to repeat the whole process: specify arti-
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:313px; top:284px; width:226px; height:94px;"><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">cles and adjust its patterns, either manually or semi-
<br>automatically. There is a bit of a dangerous pitfall
<br>here. First, it is hard to estimate how good the sys-
<br>tem can be after months of work. Furthermore, you
<br>might not know if the task is even doable in the first
<br>place. Knowing what kind of information is easily
<br>obtained in advance would help reduce this risk.
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:313px; top:379px; width:226px; height:175px;"><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">An IE task can be defined as finding a relation
<br></span><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">among several entities involved in a certain type of
<br>event. For example, in the MUC-6 management
<br>succession scenario, one seeks a relation between
<br>COMPANY, PERSON and POST involved with hir-
<br>ing/firing events. For each row of an extracted ta-
<br>ble, you can always read it as “COMPANY hired
<br>(or fired) PERSON for POST.” The relation between
<br>these entities is retained throughout the table. There
<br>are many existing works on obtaining extraction pat-
<br>terns for pre-defined relations (Riloff, 1996; Yangar-
<br>ber et al., 2000; Agichtein and Gravano, 2000; Sudo
<br>et al., 2003).
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:313px; top:555px; width:226px; height:216px;"><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">Unrestricted Relation Discovery is a technique to
<br>automatically discover such relations that repeatedly
<br>appear in a corpus and present them as a table, with
<br>absolutely no human intervention. Unlike most ex-
<br>isting IE research, a user does not specify the type
<br></span><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">of articles or information wanted. Instead, a system
<br></span><span style="font-family: QTLIUY+NimbusRomNo9L-Regu; font-size:13px">tries to find all the kinds of relations that are reported
<br>multiple times and can be reported in tabular form.
<br>This technique will open up the possibility of try-
<br>ing new IE scenarios. Furthermore, the system itself
<br>can be used as an IE system, since an obtained re-
<br>lation is already presented as a table. If this system
<br>works to a certain extent, tuning an IE system be-
<br>comes a search problem: all the tables are already
<br>built “preemptively.” A user only needs to search
<br>for a relevant table.
<br></span></div><div style="position:absolute; top:0px;">Page: <a href="#1">1</a></div>
</body></html>