TU Darmstadt / ULB / TUprints

Exploratory search in time-oriented primary data

Bernard, Jürgen (2015)
Exploratory search in time-oriented primary data.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication

[img]
Preview
Text
Exploratory Search in Time-Oriented Primary Data - Jürgen Bernard - 2015.pdf
Copyright Information: CC BY-NC-SA 3.0 Unported - Creative Commons, Attribution, NonCommercial, ShareAlike.

Download (41MB) | Preview
Item Type: Ph.D. Thesis
Type of entry: Primary publication
Title: Exploratory search in time-oriented primary data
Language: English
Referees: Fellner, Prof. Dr. Dieter W. ; Schreck, Prof. Dr. Tobias
Date: 2 December 2015
Place of Publication: Darmstadt
Date of oral examination: 8 October 2015
Abstract:

In a variety of research fields, primary data that describes scientific phenomena in an original condition is obtained. Time-oriented primary data, in particular, is an indispensable data type, derived from complex measurements depending on time. Today, time-oriented primary data is collected at rates that exceed the domain experts’ abilities to seek valuable information undiscovered in the data. It is widely accepted that the magnitudes of uninvestigated data will disclose tremendous knowledge in data-driven research, provided that domain experts are able to gain insight into the data. Domain experts involved in data-driven research urgently require analytical capabilities. In scientific practice, predominant activities are the generation and validation of hypotheses. In analytical terms, these activities are often expressed in confirmatory and exploratory data analysis. Ideally, analytical support would combine the strengths of both types of activities.

Exploratory search (ES) is a concept that seamlessly includes information-seeking behaviors ranging from search to exploration. ES supports domain experts in both gaining an understanding of huge and potentially unknown data collections and the drill-down to relevant subsets, e.g., to validate hypotheses. As such, ES combines predominant tasks of domain experts applied to data-driven research. For the design of useful and usable ES systems (ESS), data scientists have to incorporate different sources of knowledge and technology. Of particular importance is the state-of-the-art in interactive data visualization and data analysis. Research in these factors is at heart of Information Visualization (IV) and Visual Analytics (VA). Approaches in IV and VA provide meaningful visualization and interaction designs, allowing domain experts to perform the information-seeking process in an effective and efficient way. Today, bestpractice ESS almost exclusively exist for textual data content, e.g., put into practice in digital libraries to facilitate the reuse of digital documents. For time-oriented primary data, ES mainly remains at a theoretical state.

Motivation and Problem Statement. This thesis is motivated by two main assumptions. First, we expect that ES will have a tremendous impact on data-driven research for many research fields. In this thesis, we focus on time-oriented primary data, as a complex and important data type for data-driven research. Second, we assume that research conducted to IV and VA will particularly facilitate ES. For time-oriented primary data, however, novel concepts and techniques are required that enhance the design and the application of ESS. In particular, we observe a lack of methodological research in ESS for time-oriented primary data. In addition, the size, the complexity, and the quality of time-oriented primary data hampers the content-based access, as well as the design of visual interfaces for gaining an overview of the data content. Furthermore, the question arises how ESS can incorporate techniques for seeking relations between data content and metadata to foster data-driven research. Overarching challenges for data scientists are to create usable and useful designs, urgently requiring the involvement of the targeted user group and support techniques for choosing meaningful algorithmic models and model parameters. Throughout this thesis, we will resolve these challenges from conceptual, technical, and systemic perspectives. In turn, domain experts can benefit from novel ESS as a powerful analytical support to conduct data-driven research.

Concepts for Exploratory Search Systems (Chapter 3). We postulate concepts for the ES in time-oriented primary data. Based on a survey of analysis tasks supported in IV and VA research, we present a comprehensive selection of tasks and techniques relevant for search and exploration activities. The assembly guides data scientists in the choice of meaningful techniques presented in IV and VA. Furthermore, we present a reference workflow for the design and the application of ESS for time-oriented primary data. The workflow divides the data processing and transformation process into four steps, and thus divides the complexity of the design space into manageable parts. In addition, the reference workflow describes how users can be involved in the design. The reference workflow is the framework for the technical contributions of this thesis.

Visual-Interactive Preprocessing of Time-Oriented Primary Data (Chapter 4). We present a visual-interactive system that enables users to construct workflows for preprocessing time-oriented primary data. In this way, we introduce a means of providing content-based access. Based on a rich set of preprocessing routines, users can create individual solutions for data cleansing, normalization, segmentation, and other preprocessing tasks. In addition, the system supports the definition of time series descriptors and time series distance measures. Guidance concepts support users in assessing the workflow generalizability, which is important for large data sets. The execution of the workflows transforms time-oriented primary data into feature vectors, which can subsequently be used for downstream search and exploration techniques. We demonstrate the applicability of the system in usage scenarios and case studies.

Content-Based Overviews (Chapter 5). We introduce novel guidelines and techniques for the design of contentbased overviews. The three key factors are the creation of meaningful data aggregates, the visual mapping of these aggregates into the visual space, and the view transformation providing layouts of these aggregates in the display space. For each of these steps, we characterize important visualization and interaction design parameters allowing the involvement of users. We introduce guidelines supporting data scientists in choosing meaningful solutions. In addition, we present novel visual-interactive quality assessment techniques enhancing the choice of algorithmic model and model parameters. Finally, we present visual interfaces enabling users to formulate visual queries of the time-oriented data content. In this way, we provide means of combining content-based exploration with content-based search.

Relation Seeking Between Data Content and Metadata (Chapter 6). We present novel visual interfaces enabling domain experts to seek relations between data content and metadata. These interfaces can be integrated into ESS to bridge analytical gaps between the data content and attached metadata. In three different approaches, we focus on different types of relations and define algorithmic support to guide users towards most interesting relations. Furthermore, each of the three approaches comprises individual visualization and interaction designs, enabling users to explore both the data and the relations in an efficient and effective way. We demonstrate the applicability of our interfaces with usage scenarios, each conducted together with domain experts. The results confirm that our techniques are beneficial for seeking relations between data content and metadata, particularly for data-centered research.

Case Studies - Exploratory Search Systems (Chapter 7). In two case studies, we put our concepts and techniques into practice. We present two ESS constructed in design studies with real users, and real ES tasks, and real timeoriented primary data collections. The web-based VisInfo ESS is a digital library system facilitating the visual access to time-oriented primary data content. A content-based overview enables users to explore large collections of time series measurements and serves as a baseline for content-based queries by example. In addition, VisInfo provides a visual interface for querying time oriented data content by sketch. A result visualization combines different views of the data content and metadata with faceted search functionality. The MotionExplorer ESS supports domain experts in human motion analysis. Two content-based overviews enhance the exploration of large collections of human motion capture data from two perspectives. MotionExplorer provides a search interface, allowing domain experts to query human motion sequences by example. Retrieval results are depicted in a visual-interactive view enabling the exploration of variations of human motions. Field study evaluations performed for both ESS confirm the applicability of the systems in the environment of the involved user groups. The systems yield a significant improvement of both the effectiveness and the efficiency in the day-to-day work of the domain experts. As such, both ESS demonstrate how large collections of time-oriented primary data can be reused to enhance data-centered research.

In essence, our contributions cover the entire time series analysis process starting from accessing raw time-oriented primary data, processing and transforming time series data, to visual-interactive analysis of time series. We present visual search interfaces providing content-based access to time-oriented primary data. In a series of novel explorationsupport techniques, we facilitate both gaining an overview of large and complex time-oriented primary data collections and seeking relations between data content and metadata. Throughout this thesis, we introduce VA as a means of designing effective and efficient visual-interactive systems. Our VA techniques empower data scientists to choose appropriate models and model parameters, as well as to involve users in the design. With both principles, we support the design of usable and useful interfaces which can be included into ESS. In this way, our contributions bridge the gap between search systems requiring exploration support and exploratory data analysis systems requiring visual querying capability. In the ESS presented in two case studies, we prove that our techniques and systems support data-driven research in an efficient and effective way.

Alternative Abstract:
Alternative AbstractLanguage

Primärdaten beschreiben Phänomene in ihrer ursprünglichen Form und unterliegen damit keiner Veränderung oder Manipulation. So darf stets vermutet werden, dass zeitbasierte Primärdaten unerforschtes Wissen birgen, welches insbesondere für die datenzentrierte Forschung von großem Interesse ist. In aufwändigen Projekten werden zeitbasierte Primärdaten erhoben und anschließend persistiert. Die Größe, die Heterogenität, sowie der Zeitbezug zeitbasierter Primärdaten stellt die datenzentrierte Forschung vor große Herausforderungen. Um unerforschtes Wissen abzurufen bedarf es geeigneter Werkzeuge aus den Bereichen der konfirmativen und vor allem der explorativen Datenanalyse. Eine Vision in der Forschungslandschaft ist Wiederverwendung von persistierten Primärdaten. So könnten auch andere Forscher an der datenzentrierten Forschung teilhaben. Insbesondere zeitbasierte Daten sind häufig unwiederbringlich, was deren Wiederverwendung weiter motiviert. Eine der entscheidenden Fragen besteht darin wie Forschern ein intuitiver und effektiver Zugang zu zeitbasierten Primärdaten gewährt werden kann, selbst wenn das Informationsbedürfnis der Forscher zunächst unbestimmt ist.

In dieser Dissertation habe ich es mir zur Aufgabe gemacht die datenzentrierte Forschung bei derWiederverwendung und der Analyse von zeitbasierten Primärdaten zu unterstützen. Dazu setze ich das Konzept der Explorativen Suche (ES) erstmals für zeitbasierte Primärdaten in die Praxis um. Grundsätzlich repräsentiert die ES die Idee, verschiedene Informationsbedürfnisse des Nutzers in einem System vereint zu unterstützen. Dabei sollen Aktivitäten vom Abrufen von Faktenwissen (Suche) bis hin zur Erkundung völlig neuer Such- und Informationsräume (Exploration) unterstützt werden. Um die ES erstmals für zeitbasierte Primärdaten umzusetzen, bediene ich mich der Techniken der Informationsvisualisuerung und der Visual Analytics. Die Informationsvisualisierung ist die Lehre der visuell-interaktiven Repräsentierung von abstrakten Daten, Visual Analytics erforscht das geeignete Zusammenspiel zwischen automatischer Datenanalyse und visueller Datenexploration. Eine Recherche verwandter Arbeiten ergab insbesondere folgende ungelöste Probleme. Zunächst existierte die ES nur als Konzept, mit der Ausnahme von Systemen für Textdaten. Es fehlte an Strategien, um das Design geeigneter Systeme auch methodisch zu unterstüzen. Der inhaltsbasierte Zugang zu zeitbasierten Primärdaten stellte ein zentrales technisches Problem dar. So war die Suche bisher nur über Metadaten (Daten über Daten) möglich. Zur Unterstützung der explorativen Datenanalyse lag eine Schwierigkeit darin, einen Überblick über große Mengen an zeitbasierten Primärdaten in einem visuellen Suchsystem anzubieten. Des Weiteren bestand ein Defizit in Suchsystemen darin, dass die Identifikation von Zusammenhängen zwischen Zeitseriendaten (dem Datencontent) und Metadaten nicht Teil des analytischen Repertoires war.

In dieser Dissertation beschäftige ich mich mit diesen Herausforderungen und entwickle Methoden, Techniken, und Systeme für die ES in zeitbasierten Primärdaten. Es werden Methoden für das Design von explorativen Suchsystemen für zeitbasierte Primärdaten aufgezeigt (Kapitel 3). Darauf aufbauend stellen die Kapitel 4, 5, und 6 die technischen Schwerpunkte der Disseration dar. Zunächst löst das erste Visual Analytics System für das visuell-interaktive Preprocessing von Zeitseriendaten das Problem des inhaltsbasierten Zugangs zu zeitbasierten Primärdaten. Ein weiteres Kapitel stellt Richtlinien und Techniken für das Design von Überblicksvisualisierungen für Zeitseriendaten vor. Schließlich werden drei neuartige Techniken für die kombinierte Analyse von Datencontent und Metadaten vorgestellt. Die technischen Beiträge dieser Dissertation berücksichtigen explizit die Herausforderung, geeignete algorithmische Modelle in der richtigen Reihenfolge und mit den richtigen Parametern zu wählen. Des Weiteren wird für alle Techniken beschrieben, wie Nutzer in das Design involviert werden können. In Kapitel 7 validiere die Methoden und Techniken anhand zweier explorativer Suchsysteme für zeitbasierter Primärdaten. Mit den Ergebnissen dieser Dissertation leiste ich einen Beitrag zurWiederverwendung von zeitbasierten Primärdaten, insbesondere zur Unterstützung der datenzentrierten Forschung. Nutzer können durch die Definition von visuell-interaktiven Suchanfragen (query-by-sketch, query-by-example) direkt im Datencontent suchen. Mit visuell-interaktiven Überblicksdarstellungen sind Nutzer zudem in der Lage unbekannte Zusammenhänge im Suchraum zu explorieren und diese für die Wissenserweiterung zu nutzen. Durch die Öffnung des Designprozesses für den Nutzer und die strikt visuelle Art der Datenrepräsentierung leistet diese Dissertation zudem einen Beitrag zum User-centered Design, sowie zur Kommunikation von Information und Wissen aus zeitbasierten Primärdaten.

German
URN: urn:nbn:de:tuda-tuprints-51739
Classification DDC: 000 Generalities, computers, information > 004 Computer science
000 Generalities, computers, information > 020 Library and information sciences
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Interactive Graphics Systems
Date Deposited: 16 Dec 2015 14:38
Last Modified: 09 Jul 2020 01:11
URI: https://tuprints.ulb.tu-darmstadt.de/id/eprint/5173
PPN: 38682116X
Export:
Actions (login required)
View Item View Item