Logo des Repositoriums
  • English
  • Deutsch
Anmelden
Keine TU-ID? Klicken Sie hier für mehr Informationen.
  1. Startseite
  2. Publikationen
  3. Publikationen der Technischen Universität Darmstadt
  4. Erstveröffentlichungen
  5. Composing Measures for Computing Text Similarity
 
  • Details
2015
Erstveröffentlichung
Report

Composing Measures for Computing Text Similarity

File(s)
Download
Hauptpublikation
TUD-CS-2015-0017.pdf
CC BY-NC-ND 3.0 Unported
Format: Adobe PDF
Size: 398.36 KB
TUDa URI
tuda/2665
URN
urn:nbn:de:tuda-tuprints-43429
DOI
10.26083/tuprints-00004342
Autor:innen
Bär, Daniel
Zesch, Torsten
Gurevych, Iryna
Kurzbeschreibung (Abstract)

We present a comprehensive study of computing similarity between texts. We start from the observation that while the concept of similarity is well grounded in psychology, text similarity is much less well-defined in the natural language processing community. We thus define the notion of text similarity and distinguish it from related tasks such as textual entailment and near-duplicate detection. We then identify multiple text dimensions, i.e. characteristics inherent to texts that can be used to judge text similarity, for which we provide empirical evidence. We discuss state-of-the-art text similarity measures previously proposed in the literature, before continuing with a thorough discussion of common evaluation metrics and datasets. Based on the analysis, we devise an architecture which combines text similarity measures in a unified classification framework. We apply our system in two evaluation settings, for which it consistently outperforms prior work and competing systems: (a) an intrinsic evaluation in the context of the Semantic Textual Similarity Task as part of the Semantic Evaluation (SemEval) exercises, and (b) an extrinsic evaluation for the detection of text reuse. As a basis for future work, we introduce DKPro Similarity, an open source software package which streamlines the development of text similarity measures and complete experimental setups.

Freie Schlagworte

Text Similarity

Plagiarism

Paraphrase Recognitio...

Sprache
Englisch
Herausgebende Körperschaft
UKP Lab, Technische Universität Darmstadt
Fachbereich/-gebiet
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
DDC
000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Institution
Universitäts- und Landesbibliothek Darmstadt
Ort
Darmstadt
PPN
386760349
Zusätzliche Links (Organisation)
https://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&tx_bibtex_pi1%5Bpub_id%5D=TUD-CS-2015-0017

  • TUprints Leitlinien
  • Cookie-Einstellungen
  • Impressum
  • Datenschutzbestimmungen
  • Webseitenanalyse
Diese Webseite wird von der Universitäts- und Landesbibliothek Darmstadt (ULB) betrieben.