Principled Approaches to Automatic Text Summarization

Peyrard, Maxime (2019)
Principled Approaches to Automatic Text Summarization.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication

Preview

Text
Peyrard_Maxime_PhD_Thesis.pdf
Copyright Information: CC BY-NC-ND 4.0 International - Creative Commons, Attribution NonCommercial, NoDerivs.
Download (2MB) | Preview

Item Type:

Ph.D. Thesis

Type of entry:

Primary publication

Title:

Principled Approaches to Automatic Text Summarization

Language:

English

Referees:

Gurevych, Prof. Dr. Iryna ; Fürnkranz, Prof. Dr. Johannes ; Nenkova, Prof. Dr. Ani

Date:

20 August 2019

Place of Publication:

Darmstadt

Date of oral examination:

29 January 2019

Abstract:

Automatic text summarization is a particularly challenging Natural Language Processing (NLP) task involving natural language understanding, content selection and natural language generation. In this thesis, we concentrate on the content selection aspect, the inherent problem of summarization which is controlled by the notion of information Importance. We present a simple and intuitive formulation of the summarization task as two components: a summary scoring function θ measuring how good a text is as a summary of the given sources, and an optimization technique O extracting a summary with a high score according to θ. This perspective offers interesting insights over previous summarization efforts and allows us to pinpoint promising research directions. In particular, we realize that previous works heavily constrained the summary scoring function in order to solve convenient optimization problems (e.g., Integer Linear Programming). We question this assumption and demonstrate that General Purpose Optimization (GPO) techniques like genetic algorithms are practical. These GPOs do not require mathematical properties from the objective function and, thus, the summary scoring function can be relieved from its previously imposed constraints. Additionally, the summary scoring function can be evaluated on its own based on its ability to correlate with humans. This offers a principled way of examining the inner workings of summarization systems and complements the traditional evaluations of the extracted summaries. In fact, evaluation metrics are also summary scoring functions which should correlate well with humans. Thus, the two main challenges of summarization, the evaluation and the development of summarizers, are unified within the same setup: discovering strong summary scoring functions. Hence, we investigated ways of uncovering such functions. First, we conducted an empirical study of learning the summary scoring function from data. The results show that an unconstrained summary scoring function is better able to correlate with humans. Furthermore, an unconstrained summary scoring function optimized approximately with GPO extracts better summaries than a constrained summary scoring function optimized exactly with, e.g., ILP. Along the way, we proposed techniques to leverage the small and biased human judgment datasets. Additionally, we released a new evaluation metric explicitly trained to maximize its correlation with humans. Second, we developed a theoretical formulation of the notion of Importance. In a framework rooted in information theory, we defined the quantities: Redundancy, Relevance and Informativeness. Importance arises as the notion unifying these concepts. More generally, Importance is the measure that guides which choices to make when information must be discarded. Finally, evaluation remains an open-problem with a massive impact on summarization progress. Thus, we conducted experiments on available human judgment datasets commonly used to compare evaluation metrics. We discovered that these datasets do not cover the high-quality range in which summarization systems and evaluation metrics operate. This motivates efforts to collect human judgments for high-scoring summaries as this would be necessary to settle the debate over which metric to use. This would also be greatly beneficial for improving summarization systems and metrics alike.

Alternative Abstract:

Alternative Abstract

Language

Die automatische Zusammenfassung von Texten ist eine besonders herausfordernde Aufgabe bei der Natural Language Processing (NLP). In dieser Arbeit konzentrieren wir uns auf den Aspekt der Inhaltsauswahl, das inhärente Problem der Zusammenfassung, das durch den Begriff der Information Importance gesteuert wird. Wir präsentieren eine einfache und intuitive Formulierung der Zusammenfassungsaufgabe als zwei Komponenten: ein Objective Function θ, die misst, wie gut ein Text als Zusammenfassung der angegebenen Quellen ist, und eine Optimierungstechnik O, die eine Zusammenfassung mit einer hohen θ extrahiert. Diese Perspektive bietet interessante Einblicke und ermöglicht es uns, Forschungsrichtungen zu bestimmen. Insbesondere stellen wir fest, dass frühere Arbeiten die objective function stark eingeschränkt haben, um praktische Optimierungsprobleme (z. B. Integer Linear Programming) zu lösen. Wir stellen diese Annahme in Frage und demonstrieren, dass GPO-Techniken (General Purpose Optimization) wie Genetic Algorithm praktisch sind. Zusätzlich kann die Objective Function einzeln ausgewertet werden. Dies bietet eine prinzipielle Möglichkeit, das Innenleben von Zusammenfassungssystemen zu untersuchen. Wir untersuchen Techniken, um starke Objective Functions zu entdecken. Zunächst führten wir eine empirische Studie durch, in der die zusammenfassende Bewertungsfunktion anhand von Daten erlernt wurde. Die Ergebnisse zeigen, dass eine uneingeschränkte objective function besser mit Menschen korrelieren kann. Zweitens entwickelten wir eine theoretische Formulierung für den Begriff der Wichtigkeit. In einem informationstheoretisch verankerten Rahmen haben wir die Größen definiert: Redundancy, Relevance und Informativeness. Importance entsteht, wenn der Begriff diese Konzepte vereint. Im Allgemeinen ist Importance das Maß dafür, welche Entscheidungen zu treffen sind, wenn Informationen verworfen werden müssen. Schließlich bleibt die Bewertung von Systemen ein offenes Problem, das sich massiv auf den Fortschritt der Zusammenfassung auswirkt. Daher führten wir Experimente mit verfügbaren Datensätzen zur Beurteilung des Menschen durch, die häufig zum Vergleich von Bewertungsmetriken verwendet werden. Wir haben festgestellt, dass diese Datensätze nicht den hochwertigen Bereich abdecken, in dem Verdichtungssysteme und Auswertungsmetriken ausgeführt werden. Dies motiviert die Bemühungen, menschliche Urteile für Zusammenfassungen mit hoher Punktzahl zu sammeln, da dies notwendig wäre, um die Debatte über die zu verwendende Metrik zu regeln. Dies wäre auch von großem Vorteil für die Verbesserung von Systemen und Metriken.

German

URN:

urn:nbn:de:tuda-tuprints-90127

Classification DDC:

000 Generalities, computers, information > 004 Computer science

Divisions:

20 Department of Computer Science > Ubiquitous Knowledge Processing

Date Deposited:

25 Oct 2019 06:58

Last Modified:

09 Jul 2020 02:43

URI: