Dependency Graph Based Sentence Fusion and Compression.
Technische Universität, Darmstadt
[Ph.D. Thesis], (2010)
PhD thesis -
(Sentence fusion and compression)
Available under Creative Commons Attribution Non-commercial No Derivatives, 2.5.
Download (1MB) | Preview
|Item Type:||Ph.D. Thesis|
|Title:||Dependency Graph Based Sentence Fusion and Compression|
The popularity of text summarization (TS) in the NLP community has been steadily increasing in recent years. This is not surprising given its practical utility: e.g., multi-document summarization systems would be of great use given the enormous amount of news published daily online. Although TS methods vary considerably, most of them share one important property: they are extractive, and the most common extraction unit is the sentence - that is, most TS systems build summaries from extracted sentences. The extractive strategy has a well-recognized drawback which is related to the fact that sentences pulled from different documents may overlap but also complement each other. As a consequence, extractive systems are often unable to produce summaries which are complete and non-redundant at the same time. Sentence fusion is a text-to-text generation technique which addresses exactly this problem. Sentence fusion systems take a set of related documents as input and output sentences ``fused'' from dependency structures of similar sentences. In this thesis we present a novel sentence fusion system which advances TS towards abstractive summarization by building a global representation of input sentences and generating a new sentence from this representation. The sentence fusion process includes two main tasks - dependency tree construction and dependency tree linearization, both of which we solve in a novel and effective way. Our tree construction method is largely unsupervised and generates grammatical sentences by taking syntactic and semantic knowledge into account without reliance on hand-crafted rules. Tree linearization is accomplished with a method that extends previous approaches but requires little overgeneration in comparison with them. Our method is also significantly more accurate than the previous ones because it utilizes features from several levels of linguistic organization (syntax, semantics, information structure). We test our system on a corpus of comparable biographies in German and obtain good readability results in an evaluation with native speakers. We also apply the same method to sentence compression (i.e., the task of producing a summary of a single sentence) in English and German and obtain results comparable to those reported by recent systems designed exclusively for this task.
|Place of Publication:||Darmstadt|
|Classification DDC:||400 Sprache > 400 Sprache, Linguistik|
|Divisions:||02 Department History and Social Science|
|Date Deposited:||08 Jun 2010 09:37|
|Last Modified:||07 Dec 2012 11:57|
|Referees:||Teich, Prof. Dr. Elke and Lapata, Dr. Mirella|
|Refereed:||9 October 2009|