Validating Topic Modeling as a Method of Analyzing Sujet and Theme
Validating Topic Modeling as a Method of Analyzing Sujet and Theme
In Computational Literary Studies (CLS), several procedures for thematic analysis have been adapted from NLP and Computer Science. Among these procedures, topic modeling is the most prominent and popular technique. We maintain, however, that this procedure is used only in the context of exploration up to date, but not in the context of justification. When we seek to prove assumptions concerning the correlation between genres, methods of computational text analysis have to be set up in research environments of justification, i.e. in environments of hypothesis testing. We provide a holistic model of validation and conceptual disambiguation of the notion of aboutness as sujet, fabula, and theme, and discuss essential methodological requirements for hypothesis-based analysis. As we maintain that validation has to be performed for individual tasks respectively, we shall perform empirical validation of topic modeling based on a new corpus of German novellas and comprehensive annotations and draw hypothetical generalizations on the applicability of topic modeling for analyzing aboutness in the domain of narrative fiction.

