Mélanie-Becquet, Frédérique ; Barré, Jean ; Seminck, Olga ; Plancq, Clément ; Naguib, Marco ; Pastor, Martial ; Poibeau, Thierry (2024)
BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature.
doi: 10.26083/tuprints-00027396
Report, Primary publication, Preprint
Text
3924_BookNLPfr_Conference_Version.pdf Copyright Information: CC BY 4.0 International - Creative Commons, Attribution. Download (2MB) |
Item Type: | Report |
---|---|
Type of entry: | Primary publication |
Title: | BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature |
Language: | English |
Date: | 28 May 2024 |
Place of Publication: | Darmstadt |
Issue Number: | 1 |
Series: | CCLS2024 Conference Preprints |
Series Volume: | 3 |
Collation: | 34 Seiten |
DOI: | 10.26083/tuprints-00027396 |
Corresponding Links: | |
Abstract: | This paper presents BookNLP-fr: the adaptation to French of BookNLP, an existing NLP pipeline tailored for literary texts in English. We provide an overview of the challenges involved in the adaptation of such a pipeline to a new language: from the challenges related to data annotation up to the development of specialized modules of entity recognition and coreference. Moving beyond the technical aspects, we explore practical applications of BookNLP-fr with a canonical task for computational literary studies: subgenre classification. We show that BookNLP-fr provides more relevant and – even more importantly – more interpretable features to perform automatic subgenre classification than the traditional bag-of-words approach. BookNLP-fr makes NLP techniques available to a larger public and constitutes a new toolkit to process large numbers of digitized books in French. This allows the field to gain a deeper literary understanding through the practice of distant reading. |
Uncontrolled Keywords: | Natural Language Processing, Computational Literary Studies, French Literature, Coreference Resolution, Entity Recognition, Subgenre Classification |
Status: | Preprint |
URN: | urn:nbn:de:tuda-tuprints-273969 |
Additional Information: | This paper has been submitted to the conference track of JCLS. It has been peer reviewed and accepted for presentation and discussion at the 3rd Annual Conference of Computational Literary Studies at Vienna, Austria, in June 2024. |
Classification DDC: | 800 Literature > 800 Literature, rhetoric and criticism |
Divisions: | 02 Department of History and Social Science > Institut für Sprach- und Literaturwissenschaft > Digital Philology – Modern German Literary Studies |
Date Deposited: | 28 May 2024 07:53 |
Last Modified: | 31 Jul 2024 13:36 |
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/27396 |
PPN: | 518965619 |
Export: |
View Item |