Arnold, Thomas Otmar (2018)
Advanced Motif Analysis on Text Induced Graphs.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication
|
Text
Dissertation_ThomasOtmarArnold.pdf Copyright Information: CC BY-NC-ND 4.0 International - Creative Commons, Attribution NonCommercial, NoDerivs. Download (4MB) | Preview |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Advanced Motif Analysis on Text Induced Graphs | ||||
Language: | English | ||||
Referees: | Weihe, Prof. Dr. Karsten ; Gurevych, Prof. Dr. Iryna ; Müller-Hannemann, Prof. Dr. Matthias | ||||
Date: | 30 May 2018 | ||||
Place of Publication: | Darmstadt | ||||
Date of oral examination: | 24 May 2018 | ||||
Abstract: | Motif analysis counts the number of reoccurring patterns (or motifs) in a graph and connects these statistical numbers to the intrinsic semantics of the graph. In this thesis, we will demonstrate the potential of motif analysis on textual data, and introduce new concepts that extend conventional motifs. In particular, we will focus on three main research questions: 1. Can we use graph motifs to assess text quality? Based on the open encyclopedia Wikipedia, we transform articles of various quality levels into graph structures. There, we find motifs that indicate high or low article quality, and we connect these motifs to linguistic patterns. We also show that a qualitative analysis of the most relevant patterns can yield fruitful insights to our understanding of quality. We then take a look at quality from a very different angle and analyze motifs in the user interaction of collaborative writing communities. These interaction motifs allow us to assess the overall online community success, measured by a combination of growth and user traffic. Certain combinations of user groups show consistent beneficial or detrimental effects on the community performance. 2. How do motifs change over time? Having established that motif analysis can detect quality on different levels, we now focus at the progression of motifs in dynamic graphs. We take another look at Wikipedia articles, in particular at local text changes in article revisions. To capture patterns in these text revisions, we introduce metamotifs, or motifs of motifs. We also define the novel concept of motif stability - motifs of high stability tend to persist in dynamic graphs, motifs of low stability almost always get changed into other motifs. We present strong correlations between motif stability, established motif characteristics and the quality of the source text. 3. Are metamotifs (motifs of motifs) an improvement over simple motifs and methods? Finally, we confirm the capabilities of metamotifs, but also quantify their predictive power in a classification experiment of political speeches. To generalize from surface text level, we use semantic frames, which are more abstract than words. With a combination of semantic frames and metamotif analysis on US presidency and German Bundestag data, we confirm that metamotifs outperform traditional motifs and simpler approaches when used as machine learning features. |
||||
Alternative Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-74428 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science 400 Language > 400 Language, linguistics |
||||
Divisions: | 20 Department of Computer Science > Algorithmics DFG-Graduiertenkollegs > Research Training Group 1994 Adaptive Preparation of Information from Heterogeneous Sources |
||||
Date Deposited: | 01 Jun 2018 06:34 | ||||
Last Modified: | 09 Jul 2020 02:06 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/7442 | ||||
PPN: | 432240446 | ||||
Export: |
View Item |