Utama, Prasetya Ajie (2024)
Robustness of Pre-trained Language Models for Natural Language Understanding.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00026582
Ph.D. Thesis, Primary publication, Publisher's Version
Text
PhD_Thesis__Robustness_of_NLU_FinalSubmit_2023.pdf Copyright Information: CC BY-SA 4.0 International - Creative Commons, Attribution ShareAlike. Download (2MB) |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Robustness of Pre-trained Language Models for Natural Language Understanding | ||||
Language: | English | ||||
Referees: | Gurevych, Prof. Dr. Iryna ; Moosavi, Prof. Dr. Nafise Sadat ; Schwartz, Prof. Dr. Roy | ||||
Date: | 5 February 2024 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | xi, 131 Seiten | ||||
Date of oral examination: | 24 October 2023 | ||||
DOI: | 10.26083/tuprints-00026582 | ||||
Abstract: | Recent advances in neural network architectures and large-scale language model pretraining have enabled Natural Language Understanding (NLU) systems to surpass human-level performance on various benchmark datasets. However, a large body of work has revealed that NLU models are brittle against examples from outside of the training data distribution, which consequently limits their real-world application. This brittleness is mainly attributed to models exploiting spurious correlations in the training dataset. That is, models learn to use cues or shortcuts rather than robust features that are representative of the underlying task. In this thesis, we present several methods to alleviate the effect of spurious correlation on the resulting NLU models. We attempt to improve the robustness against spurious correlation from several directions. Firstly, we address the issues in modeling methods that “debias” NLU models by reducing the incentives to learn non-robust features. We introduce a regularization method that uses the existing knowledge about spurious features’ characteristics to improve the out-of-distribution generalization without degrading the original performance on the standard evaluation. We further propose a strategy to maintain the effectiveness of the debiasing methods when the required prior knowledge is not available. Specifically, we introduce a self-debiasing framework that allows the identification of potentially biased examples that models should be disincentivized to exploit. Next, we also look at the inherent robustness that language models acquire during the pre-training on large text corpora. We show how task-specific fine-tuning can be destructive to such robustness and propose a novel regularizing approach to alleviate the degradation. Lastly, we tackle the issue of data augmentation approaches that aim to improve the robust performance of NLU models over downstream application tasks. We present a method to automatically generate diverse and naturalistic examples from which models can reliably learn the task. In all task settings, we present in this thesis, models are evaluated against out-ofdistribution examples designed to penalize the reliance on spurious correlations. We measure the improvement in robustness by showing the increase in performance on these examples without the degradation of the existing standard evaluation. Overall, the work in this thesis demonstrates that we can still obtain robust NLU models using improved modeling and augmentation despite the presence of spurious correlations in the existing training resources. |
||||
Alternative Abstract: |
|
||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-265828 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science 600 Technology, medicine, applied sciences > 600 Technology 600 Technology, medicine, applied sciences > 620 Engineering and machine engineering |
||||
Divisions: | 20 Department of Computer Science > Ubiquitous Knowledge Processing | ||||
Date Deposited: | 05 Feb 2024 13:03 | ||||
Last Modified: | 06 Feb 2024 07:45 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/26582 | ||||
PPN: | 515266132 | ||||
Export: |
View Item |