Puzikov, Yevgeniy (2021)
Principled Approach to Natural Language Generation.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00019115
Ph.D. Thesis, Primary publication, Publisher's Version
|
Text
Puzikov_Yevgeniy_20210707.pdf Copyright Information: CC BY-NC-ND 4.0 International - Creative Commons, Attribution NonCommercial, NoDerivs. Download (3MB) | Preview |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Principled Approach to Natural Language Generation | ||||
Language: | English | ||||
Referees: | Gurevych, Prof. Dr. Iryna ; Dagan, Prof. Ido ; Gardent, Prof. Dr. Claire | ||||
Date: | 2021 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | viii, 172 Seiten | ||||
Date of oral examination: | 22 June 2021 | ||||
DOI: | 10.26083/tuprints-00019115 | ||||
Abstract: | The research field of Natural Language Generation offers practitioners a wide range of techniques for producing texts from a variety of data types. These techniques find their way into various real- world applications and help many people to automate time-consuming tasks of text production in many areas. At the moment, the design and evaluation of text generation approaches is largely empirical. Many systems are being developed to solve one particular task and work on a single data type, which makes it hard to compare the approach to any other technique and critically evaluate its performance. Some systems employ complex machine learning algorithms to learn rich data representations and perform joint modeling of the steps involved in the process of text generation. Such approaches offer an attractive trade-off between the development costs and output quality, but often lack transparency in terms of the reasoning about the behavior of the system. The number of the proposed approaches constantly grows, but the methodology lags behind and sometimes fails to solicit a better understanding of which approaches work, and the reasons for it. In this thesis we present our view on the task of text production from a methodological point of view. We analyze the existent scientific literature, examine common text generation approaches and the established evaluation protocols. We further propose a principled view on the problem: we break it into components, examine their interaction and develop a set of recommendations which are envisioned to offer assistance during the design or analysis of a study. We further conduct a range of experiments to test this framework in several text generation tasks. First, we show that task specification analysis sometimes allows one to solve the problem at hand with very simple techniques, without resorting to the complex machinery of advanced statistical learning methods. We further demonstrate the potential of the developed framework to find discrepancies in the established evaluation protocols. We show that sometimes neither metric, nor conventional human evaluation is sufficient to draw conclusions about system performance. We demonstrate how a system can fit the data to achieve high automatic metric scores, while falling short in terms of actual output quality. Finally, we use the framework to demonstrate how one can develop effective text generation systems without sacrificing the transparency of the inner working logic, making the developed systems both accurate and reliable. |
||||
Alternative Abstract: |
|
||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-191154 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science > Ubiquitous Knowledge Processing | ||||
TU-Projects: | Bund/BMBF|01IS17050|Software Campus 2.0 DFG|GRK1994|Gurevych_GRK_1994_Au DFG|GU798/17-1|Deutsch-Israelische |
||||
Date Deposited: | 19 Jul 2021 11:33 | ||||
Last Modified: | 19 Jul 2021 11:34 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/19115 | ||||
PPN: | 483259012 | ||||
Export: |
View Item |