TU Darmstadt / ULB / TUprints

The InsightsNet Climate Change Corpus (ICCC) : Compiling a Multimodal Corpus of Discourses in a Multi-Disciplinary Domain

Volkanovska, Elena ; Tan, Sherry ; Duan, Changxu ; Bartsch, Sabine ; Stille, Wolfgang (2025)
The InsightsNet Climate Change Corpus (ICCC) : Compiling a Multimodal Corpus of Discourses in a Multi-Disciplinary Domain.
In: Datenbank-Spektrum : Zeitschrift für Datenbanktechnologien und Information Retrieval, 2023, 23 (3)
doi: 10.26083/tuprints-00028356
Article, Secondary publication, Publisher's Version

[img] Text
s13222-023-00454-1.pdf
Copyright Information: CC BY 4.0 International - Creative Commons, Attribution.

Download (1MB)
Item Type: Article
Type of entry: Secondary publication
Title: The InsightsNet Climate Change Corpus (ICCC) : Compiling a Multimodal Corpus of Discourses in a Multi-Disciplinary Domain
Language: English
Date: 16 January 2025
Place of Publication: Darmstadt
Year of primary publication: November 2023
Place of primary publication: Berlin ; Heidelberg
Publisher: Springer
Journal or Publication Title: Datenbank-Spektrum : Zeitschrift für Datenbanktechnologien und Information Retrieval
Volume of the journal: 23
Issue Number: 3
DOI: 10.26083/tuprints-00028356
Corresponding Links:
Origin: Secondary publication DeepGreen
Abstract:

The discourse on climate change has become a centerpiece of public debate, thereby creating a pressing need to analyze the multitude of messages created by the participants in this communication process. In addition to text, information on this topic is conveyed multimodally, through images, videos, tables and other data objects that are embedded within documents and accompany the text. This paper presents the process of building a multimodal pilot corpus to the InsightsNet Climate Change Corpus (ICCC) and using natural language processing (NLP) tools to enrich corpus (meta)data, thus creating a dataset that lends itself to the exploration of the interplay between the various modalities that constitute the discourse on climate change. We demonstrate how the pilot corpus can be queried for relevant information in two types of databases, and how the proposed data model promotes a more comprehensive sentiment analysis approach.

Uncontrolled Keywords: Corpus, Climate change, Computational linguistics, Annotation, Metadata
Status: Publisher's Version
URN: urn:nbn:de:tuda-tuprints-283563
Classification DDC: 000 Generalities, computers, information > 004 Computer science
400 Language > 400 Language, linguistics
400 Language > 420 English
Divisions: 02 Department of History and Social Science > Institut für Sprach- und Literaturwissenschaft > Corpus- und Computerlinguistik, Englische Philologie
Zentrale Einrichtungen > hessian.AI - The Hessian Center for Artificial Intelligence
Date Deposited: 16 Jan 2025 13:44
Last Modified: 16 Jan 2025 13:44
SWORD Depositor: Deep Green
URI: https://tuprints.ulb.tu-darmstadt.de/id/eprint/28356
PPN:
Export:
Actions (login required)
View Item View Item