TU Darmstadt / ULB / TUprints

BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature

Mélanie-Becquet, Frédérique ; Barré, Jean ; Seminck, Olga ; Plancq, Clément ; Naguib, Marco ; Pastor, Martial ; Poibeau, Thierry (2024)
BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature.
doi: 10.26083/tuprints-00027396
Report, Primary publication, Preprint

[img] Text
3924_BookNLPfr_Conference_Version.pdf
Copyright Information: CC BY 4.0 International - Creative Commons, Attribution.

Download (2MB)
Item Type: Report
Type of entry: Primary publication
Title: BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature
Language: English
Date: 28 May 2024
Place of Publication: Darmstadt
Issue Number: 1
Series: CCLS2024 Conference Preprints
Series Volume: 3
Collation: 34 Seiten
DOI: 10.26083/tuprints-00027396
Corresponding Links:
Abstract:

This paper presents BookNLP-fr: the adaptation to French of BookNLP, an existing NLP pipeline tailored for literary texts in English. We provide an overview of the challenges involved in the adaptation of such a pipeline to a new language: from the challenges related to data annotation up to the development of specialized modules of entity recognition and coreference. Moving beyond the technical aspects, we explore practical applications of BookNLP-fr with a canonical task for computational literary studies: subgenre classification. We show that BookNLP-fr provides more relevant and – even more importantly – more interpretable features to perform automatic subgenre classification than the traditional bag-of-words approach. BookNLP-fr makes NLP techniques available to a larger public and constitutes a new toolkit to process large numbers of digitized books in French. This allows the field to gain a deeper literary understanding through the practice of distant reading.

Uncontrolled Keywords: Natural Language Processing, Computational Literary Studies, French Literature, Coreference Resolution, Entity Recognition, Subgenre Classification
Status: Preprint
URN: urn:nbn:de:tuda-tuprints-273969
Additional Information:

This paper has been submitted to the conference track of JCLS. It has been peer reviewed and accepted for presentation and discussion at the 3rd Annual Conference of Computational Literary Studies at Vienna, Austria, in June 2024.

Classification DDC: 800 Literature > 800 Literature, rhetoric and criticism
Divisions: 02 Department of History and Social Science > Institut für Sprach- und Literaturwissenschaft > Digital Philology – Modern German Literary Studies
Date Deposited: 28 May 2024 07:53
Last Modified: 31 Jul 2024 13:36
URI: https://tuprints.ulb.tu-darmstadt.de/id/eprint/27396
PPN: 518965619
Export:
Actions (login required)
View Item View Item