Conditional Random Fields for Detection of Visual Object Classes

Schnitzspan, Paul (2010)
Conditional Random Fields for Detection of Visual Object Classes.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication

Preview

PDF
thesis_print.pdf
Copyright Information: CC BY-NC-ND 2.5 Generic - Creative Commons, Attribution, NonCommercial, NoDerivs .
Download (22MB) | Preview

Item Type:

Ph.D. Thesis

Type of entry:

Primary publication

Title:

Conditional Random Fields for Detection of Visual Object Classes

Language:

English

Referees:

Roth, Prof. Ph.D Stefan ; Schiele, Prof. Dr. Bernt

Date:

8 September 2010

Place of Publication:

Darmstadt

Date of oral examination:

3 September 2010

Abstract:

High-level computer vision tasks, such as object detection in single images, are of growing importance for our every day lives. Reliable systems for object detection, in particular, may simplify our lives significantly or make them safer (e.g.~in driver assistance scenarios). %This dissertation studies object detection in challenging scenes based on graphical models. Graphical models lend themselves to analyze and design computer vision algorithms because of their modularity that allows to design complex models built on simpler modules. This modularity and decomposability enables a better understanding of the domain of interest that in turn enables the design of models with increased reliability. In this dissertation we study discriminative, undirected graphical models, namely conditional random fields (CRFs), and propose extensions to standard CRFs in order to address object detection in challenging scenes. %The use of CRFs allows a fundamental understanding of the structure of the domain of interest that is crucial for reliably handling challenging scenes. %These challenging scenes require a fundamental understanding of the structure of the domain of interest. We discuss the advantages of discriminative models compared to generative variants in the presence of cluttered background, partial occlusion and viewpoint variation. While standard CRFs are restricted to fixed, local neighborhood dependencies we propose to learn arbitrary graph structures. Furthermore, we take advantage of the decomposability of graphical models and propose to interpret the random variables as object parts and develop a joint approach of part-based and monolithic object detection. This view on objects yields a better and intuitive understanding of the structure of objects, and in accordance with observations of related work we demonstrate an improved reliability of our joint system. A secondary focus of this work is the field of search and rescue robotics. Specifically, we are concerned with victim detection in search and rescue scenarios, which requires additional demands besides reliability. In this setting we require real-time capable models, hence, we need efficient algorithms without sacrificing performance. We propose to leverage the complementarity of different sensors (visual, thermal and laser in this work) within a sensor fusion scheme for an improved victim detection performance.

Alternative Abstract:

Alternative Abstract

Language

Diese Dissertation beschäftigt sich mit der Lokalisierung von Objekten in komplexen Szenen. Die Lokalisierung von Objekten in solchen Szenen ist von immenser Bedeutung für unser tägliches Leben, weil zuverlässige Systeme unser Leben vereinfachen oder eine höhere Sicherheit garantieren könnten (z.B. in Fahrerassistenzprogrammen). Basierend auf graphischen Modellen werden Modelle vorgeschlagen, die ein besseres Verständnis von der Struktur von Objekten liefern können. Graphische Modelle eignen sich dafür besonders wegen ihrer Faktorisierbarkeit in einfachere Module. Diese Arbeit untersucht diskriminative, ungerichtete graphische Modelle (sogenannte Conditional Random Fields). Um die anspruchsvollen Szenen handhaben zu können, werden Erweiterungen der ursprünglichen Modelle vorgeschlagen. Diese Erweiterungen ermöglichen ein besseres Verständnis der Objekt Struktur und erzielen eine empirisch bewiesene bessere Genauigkeit. Dafür wird speziell die standardmäßige, lokal begrenzte Nachbarschaftsabhängigkeit durch beliebige Nachbarschaftsbeziehungen ersetzt. Ein effizienter Algorithmus zur Selektion der Nachbarchaften wird in das graphische Modell eingebunden. Weiterhin wird die Modularität der graphischen Modelle ausgenutzt und die einzelnen Zufallsvariablen als Objektteile interpretiert. Dadurch wird ein lokales Objektteile basiertes Modell mit einem globalen Objektmodell kombiniert, um, einhergehend mit verwandten Arbeiten, eine höhere Genauigkeit in der Lokalisierung von Objekten zu erzielen. Ein weiterer Schwerpunkt der Arbeit ist die Entwicklung von Rettungsrobotern. Zusätzlich zu der Genauigkeit des Systems wird in diesem Szenario eine hohe Anforderung an die Laufzeit gestellt. Nur Modelle, die in Echtzeit und auf dem Roboter direkt laufen, sind hierfür adequat. In dieser Dissertation wird ein Modell vorgeschlagen, basierend auf mehreren verschiedenen Sensoren. Hier werden visuelle, Wärme- und Lasersensoren verwendet um schnelle aber trotzdem zuverlässige Modelle zu entwickeln.

German

Uncontrolled Keywords:

Conditional Random Fields, Object Recognition

Alternative keywords: