Do Large Language Models understand literature? Case studies and probing experiments on German poetry
Do Large Language Models understand literature? Case studies and probing experiments on German poetry
This paper explores the capabilities of large language models (LLMs) in understanding literary texts, specifically poetry, through a series of qualitative experiments. We define "understanding" in a way which allows to assess task-specific capabilities while avoiding anthropomorphism. Analyzing two German poems—one very well-known, one unknown—we assess nine textual aspects: meter, rhyme, assonance, lexis, phrases, syntax, figurative language, titles, and meaning. Three levels of interaction— general knowledge, expert knowledge, and abstraction and transfer transfer — guide our evaluation. Our results show LLMs excel in analyzing semantic aspects, including figurative speech, but struggle with formal elements like rhythm and sound. Performance differences exist across textual aspects rather than complexity levels. Notably, LLMs favor established interpretations over original insights and LLMs are relatively inflexible when it comes to shifting cultural perspectives unless explicitly prompted. Thus, we show the extent to which LLMs' performance covaries more with textual aspects and the extent to which it covaries with levels of task complexity.

