TU Darmstadt / ULB / TUprints

Evaluation of Measures of Distinctiveness. Classification of Literary Texts on the Basis of Distinctive Words

Du, Keli ; Dudar, Julia ; Schöch, Christof (2023)
Evaluation of Measures of Distinctiveness. Classification of Literary Texts on the Basis of Distinctive Words.
In: Journal of Computational Literary Studies, 2022, 1 (1)
doi: 10.26083/tuprints-00023252
Article, Secondary publication, Publisher's Version

[img] Text
jcls-102-du.pdf
Copyright Information: CC BY 4.0 International - Creative Commons, Attribution.

Download (630kB)
[img] Text
jcls-102-du.xml
Copyright Information: CC BY 4.0 International - Creative Commons, Attribution.

Download (93kB)
Item Type: Article
Type of entry: Secondary publication
Title: Evaluation of Measures of Distinctiveness. Classification of Literary Texts on the Basis of Distinctive Words
Language: English
Date: 2023
Place of Publication: Darmstadt
Year of primary publication: 2022
Publisher: Universitäts- und Landesbibliothek Darmstadt
Journal or Publication Title: Journal of Computational Literary Studies
Volume of the journal: 1
Issue Number: 1
Collation: 21 Seiten
DOI: 10.26083/tuprints-00023252
Corresponding Links:
Origin: Secondary publication from TUjournals
Abstract:

This paper concerns an empirical evaluation of nine different measures of distinctiveness or ‘keyness’ in the context of Computational Literary Studies. We use nine different sets of literary texts (specifically, novels) written in seven different languages as a basis for this evaluation. The evaluation is performed as a downstream classification task, where segments of the novels need to be classified by subgenre or period of first publication. The classifier receives different numbers of features identified using different measures of distinctiveness. The main contribution of our paper is that we can show that across a wide variety of parameters, but especially when only a small number of features is used, (more recent) dispersion-based measures very often outperform other (more established) frequency-based measures by significant margins. Our findings support an emerging trend to consider dispersion as an important property of words in addition to frequency.

Uncontrolled Keywords: keyness, evaluation, literary texts, distinctiveness
Status: Publisher's Version
URN: urn:nbn:de:tuda-tuprints-232529
Additional Information:

Urspr. Konferenzveröffentlichung/Originally conference publication: 1st Annual Conference of Computational Literary Studies, 01.-02.06.2022, Darmstadt, Germany

Classification DDC: 800 Literature > 800 Literature, rhetoric and criticism
Divisions: 02 Department of History and Social Science > Institut für Sprach- und Literaturwissenschaft > Digital Philology – Modern German Literary Studies
Date Deposited: 21 Feb 2023 10:19
Last Modified: 23 Feb 2023 08:57
URI: https://tuprints.ulb.tu-darmstadt.de/id/eprint/23252
PPN:
Export:
Actions (login required)
View Item View Item