TU Darmstadt / ULB / TUprints

An Inclusive Notion of Text

Kuznetsov, Ilia ; Gurevych, Iryna (2024)
An Inclusive Notion of Text.
The 61st Annual Meeting of the Association for Computational Linguistics. Toronto, Canada (09.-14.07.2023)
doi: 10.26083/tuprints-00027658
Conference or Workshop Item, Secondary publication, Publisher's Version

[img] Text
2023.acl-long.633.pdf
Copyright Information: CC BY 4.0 International - Creative Commons, Attribution.

Download (1MB)
[img] Video
2023.acl-long.633.mp4
Copyright Information: CC BY 4.0 International - Creative Commons, Attribution.

Download (12MB)
Item Type: Conference or Workshop Item
Type of entry: Secondary publication
Title: An Inclusive Notion of Text
Language: English
Date: 8 July 2024
Place of Publication: Darmstadt
Year of primary publication: 2023
Place of primary publication: Kerrville, TX, USA
Publisher: ACL
Book Title: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Event Title: The 61st Annual Meeting of the Association for Computational Linguistics
Event Location: Toronto, Canada
Event Dates: 09.-14.07.2023
DOI: 10.26083/tuprints-00027658
Corresponding Links:
Origin: Secondary publication service
Abstract:

Natural language processing (NLP) researchers develop models of grammar, meaning and communication based on written text. Due to task and data differences, what is considered text can vary substantially across studies. A conceptual framework for systematically capturing these differences is lacking. We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. Towards that goal, we propose common terminology to discuss the production and transformation of textual data, and introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling. We apply this taxonomy to survey existing work that extends the notion of text beyond the conservative language-centered view. We outline key desiderata and challenges of the emerging inclusive approach to text in NLP, and suggest community-level reporting as a crucial next step to consolidate the discussion.

Identification Number: 2023.acl-long.633
Status: Publisher's Version
URN: urn:nbn:de:tuda-tuprints-276586
Classification DDC: 000 Generalities, computers, information > 004 Computer science
Divisions: 20 Department of Computer Science > Ubiquitous Knowledge Processing
Date Deposited: 08 Jul 2024 09:23
Last Modified: 17 Jul 2024 11:55
URI: https://tuprints.ulb.tu-darmstadt.de/id/eprint/27658
PPN: 519664728
Export:
Actions (login required)
View Item View Item