Ruan, Qian ; Kuznetsov, Ilia ; Gurevych, Iryna
eds.: Al-Onaizan, Yaser ; Bansal, Mohit ; Chen, Yun-Nung (2024)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions.
The 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida (12.11.2024-16.11.2024)
doi: 10.26083/tuprints-00028924
Conference or Workshop Item, Secondary publication, Publisher's Version
Text
2024.emnlp-main.839.pdf Copyright Information: CC BY 4.0 International - Creative Commons, Attribution. Download (1MB) |
Item Type: | Conference or Workshop Item |
---|---|
Type of entry: | Secondary publication |
Title: | Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions |
Language: | English |
Date: | 17 December 2024 |
Place of Publication: | Darmstadt |
Year of primary publication: | November 2024 |
Place of primary publication: | Kerrville, TX, USA |
Publisher: | ACL |
Book Title: | Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing |
Event Title: | The 2024 Conference on Empirical Methods in Natural Language Processing |
Event Location: | Miami, Florida |
Event Dates: | 12.11.2024-16.11.2024 |
DOI: | 10.26083/tuprints-00028924 |
Corresponding Links: | |
Origin: | Secondary publication service |
Abstract: | Classification is a core NLP task architecture with many potential applications. While large language models (LLMs) have brought substantial advancements in text generation, their potential for enhancing classification tasks remains underexplored. To address this gap, we propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task. Our extensive experiments and systematic comparisons with various training approaches and a representative selection of LLMs yield new insights into their application for EIC. We investigate the generalizability of these findings on five further classification tasks. To demonstrate the proposed methods and address the data shortage for empirical edit analysis, we use our best-performing EIC model to create Re3-Sci2.0, a new large-scale dataset of 1,780 scientific document revisions with over 94k labeled edits. The quality of the dataset is assessed through human evaluation. The new dataset enables an in-depth empirical study of human editing behavior in academic writing. We make our experimental framework, models and data publicly available. |
Status: | Publisher's Version |
URN: | urn:nbn:de:tuda-tuprints-289246 |
Classification DDC: | 000 Generalities, computers, information > 004 Computer science |
Divisions: | 20 Department of Computer Science > Ubiquitous Knowledge Processing Zentrale Einrichtungen > hessian.AI - The Hessian Center for Artificial Intelligence |
Date Deposited: | 17 Dec 2024 16:39 |
Last Modified: | 19 Dec 2024 09:01 |
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/28924 |
PPN: | 524710422 |
Export: |
View Item |