Efficient Decomposition-Based Multiclass and Multilabel Classification

Park, Sang-Hyeun (2012)
Efficient Decomposition-Based Multiclass and Multilabel Classification.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication

Preview

PDF
diss_shpark.pdf
Copyright Information: CC BY-NC-ND 2.5 Generic - Creative Commons, Attribution, NonCommercial, NoDerivs .
Download (1MB) | Preview

Item Type:

Ph.D. Thesis

Type of entry:

Primary publication

Title:

Efficient Decomposition-Based Multiclass and Multilabel Classification

Language:

English

Referees:

Fürnkranz, Prof. Dr. Johannes ; Hüllermeier, Prof. Dr. Eyke

Date:

30 May 2012

Place of Publication:

Darmstadt

Date of oral examination:

24 May 2012

Abstract:

Decomposition-based methods are widely used for multiclass and multilabel classification. These approaches transform or reduce the original task to a set of smaller possibly simpler problems and allow thereby often to utilize many established learning algorithms, which are not amenable to the original task. Even for directly applicable learning algorithms, the combination with a decomposition-scheme may outperform the direct approach, e.g., if the resulting subproblems are simpler (in the sense of learnability). This thesis addresses mainly the efficiency of decomposition-based methods and provides several contributions improving the scalability with respect to the number of classes or labels, number of classifiers and number of instances.

Initially, we present two approaches improving the efficiency of the training phase of multiclass classification. The first of them shows that by minimizing redundant learning processes, which can occur in decomposition-based approaches for multiclass problems, the number of operations in the training phase can be significantly reduced. The second approach is tailored to Naive Bayes as base learner. By a tight coupling of Naive Bayes and arbitrary decompositions, it allows an even higher reduction of the training complexity with respect to the number of classifiers. Moreover, an approach improving the efficiency of the testing phase is also presented. It is capable of reducing testing effort with respect to the number of classes independently of the base learner.

Furthermore, efficient decomposition-based methods for multilabel classification are also addressed in this thesis. Besides proposing an efficient prediction method, an approach rebalancing predictive performance, time and memory complexity is presented. Aside from the efficiency-focused methods, this thesis contains also a study about a special case of the multilabel classification setting, which is elaborated, formalized and tackled by a prototypical decomposition-based approach.

Alternative Abstract:

Alternative Abstract

Language

Multiklassen- und Multilabel-Klassifikationsprobleme werden häufig durch zerlegungsbasierte Ansätze gelöst. Zerlegungsbasierte Ansätze haben gemeinsam, dass sie das ursprüngliche Problem auf eine Menge von kleineren potentiell einfacheren Problemen abbilden. Oft ermöglichen solche Ansätze die Wiederverwendung von vielen bewährten Lernalgorithmen, die nicht direkt auf das ursprüngliche Problem anwendbar sind. Darüber hinaus können auch für direkt anwendbare Lernalgorithmen die zerlegten Teilprobleme einfacher (im Sinne der Lernbarkeit) sein, so dass ein zerlegungsbasierter Ansatz insgesamt eine höhere Vorhersagequalität besitzen kann als die direkte Lösung des Problems. Diese Dissertation beschäftigt sich hauptsächlich mit der Effizienz der zerlegungsbasierten Methoden und erarbeitet mehrere Ansätze mit einer besseren Skalierbarkeit bezüglich Anzahl der Klassen bzw. Labels, Klassifizierer und Instanzen der Daten.

Es werden zunächst zwei Ansätze vorgestellt, welche die Trainingsphase für Multiklassenprobleme beschleunigen. In dem ersten Ansatz wird gezeigt, dass durch Minimierung von redundanten Lernvorgängen, die oft in zerlegungsbasierten Multiklassen-Klassifikationsansätzen vorkommen können, Einsparungen in der Trainingsphase möglich sind. Der zweite Ansatz ist speziell auf Naive Bayes als Basislerner ausgerichtet und ermöglicht durch die Ausnutzung spezieller Eigenschaften in diesem Fall eine noch größere Reduktion der Lernkomplexität bezüglich der Klassifiziereranzahl. Es wird zusätzlich ein Ansatz präsentiert, welches die Klassifikationsphase für Multiklassenprobleme beschleunigt. Dieses Verfahren ist unabhängig vom verwendeten Basislerner und reduziert die Klassifikationskomplexität bezüglich der Klassenanzahl.

Darüber hinaus werden in dieser Dissertation auch Multilabelprobleme behandelt und dafür neben einer effizienten Klassifikationsmethode auch ein Ansatz vorgestellt, welches die Vorhersagequalität, den Zeitaufwand und die Speicherkomplexität neu abwägt. Neben den effizienzfokussierten Ansätzen beinhaltet diese Dissertation auch eine Studie, die einen Spezialfall von Multilabel-Klassifikationsproblemen vorstellt, formalisiert und mittels einem prototypischen zerlegungsbasierten Ansatz zu lösen versucht.

German

Uncontrolled Keywords:

efficient classification, efficient decoding, efficient training, decomposition-based, multiclass, multilabel, classification, error-correcting output codes, aggregation

Alternative keywords:

Alternative keywords	Language
Effiziente Klassifikation, Effiziente Dekodierung, Effiziente Trainingsphase, zerlegungsbasiert, Multiklassen, Multilabel, Klassifikation, Error-Correcting Output Codes, Aggregation	German

URN:

urn:nbn:de:tuda-tuprints-29942

Classification DDC:

000 Generalities, computers, information > 004 Computer science

Divisions:

20 Department of Computer Science
20 Department of Computer Science > Knowledge Engineering

Date Deposited:

01 Jun 2012 07:18

Last Modified:

07 Dec 2012 12:05

URI:

https://tuprints.ulb.tu-darmstadt.de/id/eprint/2994