Lebenszyklusinformationen von Wissensdokumenten - Erfassung, Verwaltung und Validierung

Lehmann, Lasse (2010)
Lebenszyklusinformationen von Wissensdokumenten - Erfassung, Verwaltung und Validierung.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication

Preview

PDF
DissLehmann_genehmigt.pdf
Copyright Information: In Copyright.
Download (3MB) | Preview

Item Type:

Ph.D. Thesis

Type of entry:

Primary publication

Title:

Lebenszyklusinformationen von Wissensdokumenten - Erfassung, Verwaltung und Validierung

Language:

German

Referees:

Steinmetz, Prof. Dr.- Ralf ; Hemmje, Prof. Dr.- Matthias

Date:

28 April 2010

Place of Publication:

Darmstadt

Date of oral examination:

26 March 2010

Abstract:

Mit der wachsenden Zahl digital verfügbarer Dokumente wachsen auch die Probleme der Nutzer, die Dokumente persönlich oder in der Gruppe zu organisieren. Insbesondere für Wissensarbeiter ist jedoch ein schnelles Auffinden von für ihre Arbeit relevanten Dokumenten wichtig, um effektiv arbeiten zu können. Nutzer haben aber in vielen Fällen Probleme, Dokumente, die sie oder Gruppenmitglieder gespeichert haben, wiederzufinden. Dies führt sogar so weit, dass Nutzer Dokumente, die sie aus dem Internet heruntergeladen und im Dateisystem gespeichert haben, lieber erneut im Internet suchen, als auf dem lokalen Rechner. Oft wissen sie auch nicht, dass Dokumente, in denen Gruppenmitglieder ihr Wissen dokumentiert haben, überhaupt existieren. Ein Grund für die schlechte Auffindbarkeit von lokal verwalteten Dokumenten ist, dass nur wenige zusätzliche Informationen über solche Wissensdokumente verfügbar sind. Die Metadaten der Dokumente werden kaum gepflegt und enthalten in den meisten Fällen nicht mehr als die vom Betriebssystem oder einer Applikation zur Bearbeitung des jeweiligen Dokumenttyps automatisiert erzeugten Informationen. Diese sind zumeist wenig aussagekräftig, so dass sie für eine Verbesserung der oben genannten Situation oft nicht geeignet sind. Zudem werden Nutzer durch für die Suche und Organisation der Dokumente verwendete Werkzeuge, wie zum Beispiel den Windows Dateisystem-Explorer, nicht ausreichend unterstützt. Diese Arbeit basiert auf der Beobachtung, dass eine Vielzahl von Informationen über Dokumente durch Aktionen entstehen, die auf einem Dokument durchgeführt werden. So wird ein Dokument beispielsweise geöffnet, gelesen, bearbeitet oder genutzt. Während dieser Prozesse entstehen Informationen, die für die Verwaltung oder zur Unterstützung des Auffindens der Dokumente nutzbar sind. Meist ist es so, dass die Informationen verloren gehen, wenn sie nicht während der entsprechenden Prozesse erfasst und gespeichert werden. Eine manuelle Erfassung der Informationen findet aufgrund des hohen Aufwands nicht statt. Deshalb verfolgt die vorliegende Dissertation den Ansatz, automatisiert Metadaten aus Prozessen zu gewinnen, die während seines Lebenszyklus auf einem Wissensdokument ablaufen, und die so gewonnenen Informationen entsprechend zu verwalten und nutzbar zu machen. Hierzu wird zunächst analysiert, welche Informationen während des Lebenszyklus eines Wissensdokuments entstehen. Es wird auf Basis bestehender Lebenszyklusmodelle ein Lebenszyklusmodell für Wissensdokumente entwickelt. Lebenszyklusinformationen werden definiert und in Verwendungs- und Beziehungsinformationen unterteilt. Anhand des Lebenszyklusmodells werden Informationen, die in den verschiedenen Phasen entstehen, identifiziert. Das Hauptaugenmerk liegt in der vorliegenden Dissertation auf Beziehungsinformationen, die bei der Wiederverwendung von Wissensdokumenten entstehen. Bevor Lebenszyklusinformationen genutzt werden können, müssen sie erfasst, entsprechend verwaltet und systemübergreifend zugänglich gemacht werden. Schließlich ist sicherzustellen, dass die erfassten Informationen ihre Gültigkeit behalten. Alle diese Aspekte werden in der vorliegenden Arbeit berücksichtigt. Es wird ein Framework für die automatisierte Erfassung, die Verwaltung und Nutzung von Lebenszyklusinformationen konzipiert, umgesetzt und evaluiert. Dieses Framework beinhaltet ein auf Plug-ins basierendes Konzept zur Erfassung der Informationen, welches auf fast beliebige Applikationen übertragbar ist. Zwei verschiedene Konzepte für die Erfassung von Informationen werden identifiziert und in Form von Erfassungskomponenten für drei verschiedene Applikationen umgesetzt. Die Verwaltung und Bereitstellung der erfassten Informationen erfolgt dabei serverbasiert. Für die Verwaltung der Informationen wird in der Arbeit ein Schema zur Verwaltung von Lebenszyklusinformationen vorgestellt, das insbesondere die Erfassung und Verwaltung von Beziehungsinformationen abdeckt, wofür bisher noch keine adäquate Lösung existiert. Darüber hinaus werden Konzepte für verschiedene Nutzungsszenarien von Lebenszyklusinformationen entwickelt und prototypisch für zwei dieser Szenarien umgesetzt. Gerade im Fall von Beziehungsinformationen ist es notwendig, die Gültigkeit der erfassten Informationen zu gewährleisten. Wenn durch eine Aktion eine Beziehung zwischen zwei Dokumenten entstehen kann, so kann es auch eine Aktion geben, durch welche diese Beziehung ihre Gültigkeit verliert. Um dies zu adressieren, werden in dieser Arbeit zwei Validierungsalgorithmen für Beziehungsinformationen vorgestellt und auf unterschiedlichen Korpora evaluiert. Dabei wird gezeigt, dass die entworfenen Algorithmen auf den getesteten Korpora bessere Ergebnisse liefern als State-of-the-Art-Ansätze. Es wird zudem gezeigt, dass die entworfenen Algorithmen in verschiedenen weiteren Anwendungsszenarien nutzbar sind. Die im Rahmen der Arbeit durchgeführte nutzerbasierte Evaluation des umgesetzten Frameworks zeigt, dass eine Erfassung valider Lebenszyklusinformationen mit hoher Verlässlichkeit durchführbar ist. Die vorliegende Arbeit schafft also durch die automatische Erfassung von Lebenszyklusinformationen von Wissensdokumenten die Voraussetzung und Grundlage für eine Nutzung dieser zusätzlichen Informationen in vielen Szenarien.

Alternative Abstract:

Alternative Abstract

Language

With the growing number of documents which are digitally available, users tend to have more and more problems to organize these documents personally or collaboratively. Specifically for knowledge workers, fast retrieval of relevant documents is essential. In many cases, users have problems to rediscover documents that they or members of their working group once stored somewhere. Often, users rather download documents from the Internet once again instead of searching them in their local file system. Often they do not even know that documents where members of their working group documented relevant knowledge even exist. One reason for the bad retrievability of documents organized in file systems is the lack of information about such documents. Metadata is hardly maintained and usually consists of information automatically provided by the operating system only. This information is barely distinctive and does not help to improve the situation described above. Furthermore, existing tools for retrieval and management of documents like the Windows Filesystem Explorer do not support users sufficiently. The present thesis is based on the observation that a multitude of information emerges from actions performed on a document during its lifecycle. Users open, read, edit or use documents several times. These processes provide for the emergence of information that can be utilized to support both the management and the retrieval of these documents. However, most of the information is lost if it is not captured during those processes. Manual creation of such information would be too much effort and too costly. The underlying approach of this thesis is the automatic capture of information emerging from processes conducted on a document during its lifecycle. The thus acquired information should then be organized, processed and made accessible for utilization. Initially, we analyze which kinds of information emerge during the lifecycle of a knowledge document. Based on existing lifecycle models from other domains we develop a lifecycle model for knowledge documents. We define the concept of lifecycle information and categorize it further into relation and usage information. On this basis we identify the information that emerges during the different phases of a document's lifecycle. Hereby, we focus on relation information emerging from reuse processes conducted on knowledge documents. Before lifecycle information can be utilized, it has to be captured, managed and made accessible across different systems and applications. Finally, we have to make sure that the relations captured stay valid. All of these aspects are addressed in the given thesis. We have designed, implemented and evaluated a framework for automatic capture, management and utilization of lifecycle information. The framework deploys a plug-in-based concept for the capture of information that is portable to arbitrary applications. We have identified two different means to capture valid relations and implemented both in three different capture components. The management and provision of captured information is done in a server-based manner. We propose a scheme for the organization of lifecycle information, which specifically covers the capture and management of relation information, for which no sufficient solution existed so far. Furthermore, we have designed various scenarios for the utilization of lifecycle information and implemented two of them prototypically. Especially for relation information it is necessary to ensure the validity of the information captured. If there is an action that provides for the emergence of a relation there might also be actions that cause the relation to become invalid. To address this issue we have designed two algorithms for the automatic validation of relation information and have evaluated them on different corpora. On the given corpora our algorithms perform better quality-wise than state of the art approaches while maintaining a lower storage consumption. We furthermore show that the proposed algorithms can be applied in various additional scenarios. The user-based evaluation of the proposed framework we have conducted shows that the capture of valid lifecycle information is achievable with high reliability. Through the automatic capture of lifecycle information of knowledge documents, this thesis creates a basis and prerequisite for the utilization of this new kind of information in various scenarios.

English

Uncontrolled Keywords:

Lebenszyklus, Lebenszyklusinformationen, Erfassung, Metadaten, Fingerprinting, Stringmatching, Beziehungsinformationen

Alternative keywords:

Alternative keywords	Language
Lebenszyklus, Lebenszyklusinformationen, Erfassung, Metadaten, Fingerprinting, Stringmatching, Beziehungsinformationen	German
lifecycle, lifecycle information, capture, metadata, fingerprinting, string matching, relation information	English

URN:

urn:nbn:de:tuda-tuprints-21363

Classification DDC:

600 Technology, medicine, applied sciences > 620 Engineering and machine engineering
000 Generalities, computers, information > 004 Computer science

Divisions:

18 Department of Electrical Engineering and Information Technology > Institute of Computer Engineering > Multimedia Communications

Date Deposited:

04 May 2010 08:46

Last Modified:

08 Jul 2020 23:44

URI:

https://tuprints.ulb.tu-darmstadt.de/id/eprint/2136