Beinborn, Lisa Marina (2016)
Predicting and Manipulating the Difficulty of Text-Completion Exercises for Language Learning.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication
|
Text
DissertationBeinborn_publishedVersion_September20_online.pdf - Published Version Copyright Information: CC BY-NC-ND 4.0 International - Creative Commons, Attribution NonCommercial, NoDerivs. Download (2MB) | Preview |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Predicting and Manipulating the Difficulty of Text-Completion Exercises for Language Learning | ||||
Language: | English | ||||
Referees: | Gurevych, Prof. Dr. Iryna ; Zesch, Prof. Dr. Torsten ; Meurers, Prof. Dr. Detmar | ||||
Date: | 20 September 2016 | ||||
Place of Publication: | Darmstadt | ||||
Date of oral examination: | 5 July 2016 | ||||
Abstract: | The increasing levels of international communication in all aspects of life lead to a growing demand of language skills. Traditional language courses compete nowadays with a wide range of online offerings that promise higher flexibility. However, most platforms provide rather static educational content and do not yet incorporate the recent progress in educational natural language processing. In the last years, many researchers developed new methods for automatic exercise generation, but the generated output is often either too easy or too difficult to be used with real learners. In this thesis, we address the task of predicting and manipulating the difficulty of text-completion exercises based on measurable linguistic properties to bridge the gap between technical ambition and educational needs. The main contribution consists of a theoretical model and a computational implementation for exercise difficulty prediction on the item level. This is the first automatic approach that reaches human performance levels and is applicable to various languages and exercise types. The exercises in this thesis differ with respect to the exercise content and the exercise format. As theoretical basis for the thesis, we develop a new difficulty model that combines content and format factors and further distinguishes the dimensions of text difficulty, word difficulty, candidate ambiguity, and item dependency. It is targeted at text-completion exercises that are a common method for fast language proficiency tests. The empirical basis for the thesis consists of five difficulty datasets containing exercises annotated with learner performance data. The difficulty is expressed as the ratio of learners who fail to solve the exercise. In order to predict the difficulty for unseen exercises, we implement the four dimensions of the model as computational measures. For each dimension, the thesis contains the discussion and implementation of existing measures, the development of new approaches, and an experimental evaluation on sub-tasks. In particular, we developed new approaches for the tasks of cognate production, spelling difficulty prediction, and candidate ambiguity evaluation. For the main experiments, the individual measures are combined into an machine learning approach to predict the difficulty of C-tests, X-tests and cloze tests in English, German, and French. The performance of human experts on the same task is determined by conducting an annotation study to provide a basis for comparison. The quality of the automatic prediction reaches the levels of human accuracy for the largest datasets. If we can predict the difficulty of exercises, we are able to manipulate the difficulty. We develop a new approach for exercise generation and selection that is based on the prediction model. It reaches high acceptance ratings by human users and can be directly integrated into real-world scenarios. In addition, the measures for word difficulty and candidate ambiguity are used to improve the tasks of content and distractor manipulation. Previous work for exercise difficulty was commonly limited to manual correlation analyses using learner results. The computational approach of this thesis makes it possible to predict the difficulty of text-completion exercises in advance. This is an important contribution towards the goal of completely automated exercise generation for language learning. |
||||
Alternative Abstract: |
|
||||
Uncontrolled Keywords: | difficulty prediction, enlp, CALL, ICALL, language learning, educational natural language processing, natural language processing, computer-assisted language learning, exercise generation, text difficulty, readability, item difficulty | ||||
Alternative keywords: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-56479 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science 400 Language > 400 Language, linguistics |
||||
Divisions: | 20 Department of Computer Science > Ubiquitous Knowledge Processing | ||||
Date Deposited: | 22 Sep 2016 09:20 | ||||
Last Modified: | 09 Jul 2020 01:24 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/5647 | ||||
PPN: | 387142347 | ||||
Export: |
View Item |