Category Development at the Interface of Interpretive Pragmalinguistic Annotation and Machine Learning – Annotation, Detection and Classification of linguistic routines of discourse referencing in political debates
Category Development at the Interface of Interpretive Pragmalinguistic Annotation and Machine Learning – Annotation, Detection and Classification of linguistic routines of discourse referencing in political debates
In this paper, we present a case study on quality criteria for the robustness of categories in pragmalinguistic tagset development. We model a number of classification tasks for linguistic routines of discourse referencing in the plenary minutes of the German Bundestag. In the process, we focus and reflect on three fundamental quality criteria: 1. segmentation, i.e. size of the annotated segments (e.g. words, phrases or sentences), 2. granularity, i.e. degrees of content differentiation and 3. interpretation depth, i.e. the degree of inclusion of linguistic knowledge, co-textual knowledge and extra-linguistic, context-sensitive knowledge. With the machine learnability of categories in mind, our focus is on principles and conditions of category development in collaborative annotation. Our experiments and tests on pilot corpora aim to investigate to which extent statistical measures indicate whether interpretative classifications are machine-reproducible and reliable. To this end, we compare gold-standard datasets annotated with different segment sizes (phrases, sentences) and categories with different granularity, respectively. We conduct experiments with different machine learning frameworks to automatically predict labels from our tagset. We apply BERT ([Devlin et al. 2019]), a pre-trained neural transformer language model which we finetune and constrain for our labelling and classification tasks, and compare it against Naive Bayes as a probabilistic knowledge-agnostic baseline model. The results from these experiments contribute to the development and reflection of our category systems.

