TU Darmstadt / ULB / TUprints

Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions

Ruan, Qian ; Kuznetsov, Ilia ; Gurevych, Iryna
eds.: Al-Onaizan, Yaser ; Bansal, Mohit ; Chen, Yun-Nung (2024)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions.
The 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida (12.11.2024-16.11.2024)
doi: 10.26083/tuprints-00028924
Conference or Workshop Item, Secondary publication, Publisher's Version

[img] Text
2024.emnlp-main.839.pdf
Copyright Information: CC BY 4.0 International - Creative Commons, Attribution.

Download (1MB)
Item Type: Conference or Workshop Item
Type of entry: Secondary publication
Title: Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions
Language: English
Date: 17 December 2024
Place of Publication: Darmstadt
Year of primary publication: November 2024
Place of primary publication: Kerrville, TX, USA
Publisher: ACL
Book Title: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Event Title: The 2024 Conference on Empirical Methods in Natural Language Processing
Event Location: Miami, Florida
Event Dates: 12.11.2024-16.11.2024
DOI: 10.26083/tuprints-00028924
Corresponding Links:
Origin: Secondary publication service
Abstract:

Classification is a core NLP task architecture with many potential applications. While large language models (LLMs) have brought substantial advancements in text generation, their potential for enhancing classification tasks remains underexplored. To address this gap, we propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task. Our extensive experiments and systematic comparisons with various training approaches and a representative selection of LLMs yield new insights into their application for EIC. We investigate the generalizability of these findings on five further classification tasks. To demonstrate the proposed methods and address the data shortage for empirical edit analysis, we use our best-performing EIC model to create Re3-Sci2.0, a new large-scale dataset of 1,780 scientific document revisions with over 94k labeled edits. The quality of the dataset is assessed through human evaluation. The new dataset enables an in-depth empirical study of human editing behavior in academic writing. We make our experimental framework, models and data publicly available.

Status: Publisher's Version
URN: urn:nbn:de:tuda-tuprints-289246
Classification DDC: 000 Generalities, computers, information > 004 Computer science
Divisions: 20 Department of Computer Science > Ubiquitous Knowledge Processing
Zentrale Einrichtungen > hessian.AI - The Hessian Center for Artificial Intelligence
Date Deposited: 17 Dec 2024 16:39
Last Modified: 19 Dec 2024 09:01
URI: https://tuprints.ulb.tu-darmstadt.de/id/eprint/28924
PPN: 524710422
Export:
Actions (login required)
View Item View Item