TU Darmstadt / ULB / TUprints

Stance Detection Benchmark: How Robust is Your Stance Detection?

Schiller, Benjamin ; Daxenberger, Johannes ; Gurevych, Iryna (2024)
Stance Detection Benchmark: How Robust is Your Stance Detection?
In: KI - Künstliche Intelligenz : German Journal of Artificial Intelligence, 2021, 35 (3-4)
doi: 10.26083/tuprints-00023506
Article, Secondary publication, Publisher's Version

[img] Text
Copyright Information: CC BY 4.0 International - Creative Commons, Attribution.

Download (1MB)
Item Type: Article
Type of entry: Secondary publication
Title: Stance Detection Benchmark: How Robust is Your Stance Detection?
Language: English
Date: 2 April 2024
Place of Publication: Darmstadt
Year of primary publication: November 2021
Place of primary publication: Berlin
Publisher: Springer
Journal or Publication Title: KI - Künstliche Intelligenz : German Journal of Artificial Intelligence
Volume of the journal: 35
Issue Number: 3-4
DOI: 10.26083/tuprints-00023506
Corresponding Links:
Origin: Secondary publication DeepGreen

Stance detection (StD) aims to detect an author’s stance towards a certain topic and has become a key component in applications like fake news detection, claim validation, or argument search. However, while stance is easily detected by humans, machine learning (ML) models are clearly falling short of this task. Given the major differences in dataset sizes and framing of StD (e.g. number of classes and inputs), ML models trained on a single dataset usually generalize poorly to other domains. Hence, we introduce a StD benchmark that allows to compare ML models against a wide variety of heterogeneous StD datasets to evaluate them for generalizability and robustness. Moreover, the framework is designed for easy integration of new datasets and probing methods for robustness. Amongst several baseline models, we define a model that learns from all ten StD datasets of various domains in a multi-dataset learning (MDL) setting and present new state-of-the-art results on five of the datasets. Yet, the models still perform well below human capabilities and even simple perturbations of the original test samples (adversarial attacks) severely hurt the performance of MDL models. Deeper investigation suggests overfitting on dataset biases as the main reason for the decreased robustness. Our analysis emphasizes the need of focus on robustness and de-biasing strategies in multi-task learning approaches. To foster research on this important topic, we release the dataset splits, code, and fine-tuned weights.

Uncontrolled Keywords: Stance detection, Robustness, Multi-dataset learning
Status: Publisher's Version
URN: urn:nbn:de:tuda-tuprints-235062
Additional Information:

NLP and Semantics

Classification DDC: 000 Generalities, computers, information > 004 Computer science
Divisions: 20 Department of Computer Science > Ubiquitous Knowledge Processing
Date Deposited: 02 Apr 2024 11:20
Last Modified: 03 Apr 2024 06:40
SWORD Depositor: Deep Green
URI: https://tuprints.ulb.tu-darmstadt.de/id/eprint/23506
PPN: 516764616
Actions (login required)
View Item View Item