Detection, Tracking and Pose Estimation of People in Challenging Real-World Scenes.
Technische Universität, Darmstadt
[Ph.D. Thesis], (2011)
Available under Creative Commons Attribution Non-commercial No Derivatives, 2.5.
Download (11MB) | Preview
|Item Type:||Ph.D. Thesis|
|Title:||Detection, Tracking and Pose Estimation of People in Challenging Real-World Scenes|
In this thesis, we consider three challenging and longstanding problems in computer vision: people detection, people tracking and articulated pose estimation. Generic solutions to these problems are essential building blocks for understanding images containing people, an exciting and challenging task with numerous applications in automotive safety, robotic navigation, human-computer interaction, and automatic image indexing and retrieval. Indeed, human actions, intentions and emotions can often be inferred from accurate estimates of human body poses and their movement over time. However, untill recently, accurate estimation of body poses has been possible only in controlled laboratory conditions, typically requiring multiple cameras and specialized motion capture equipment. In order to address this shortcoming, we propose algorithms capable of automatically finding people in uncontrolled outdoor environments, tracking them over time and estimating their body configurations. In the process, we also tackle several important technical challenges, including the large appearance variability of humans, the full and partial occlusions that frequently occur in typical street scenes, and ambiguities in 2D to 3D lifting and data association.
Humans appear in images wearing a large variety of clothing, in a large number of possible body poses and visible from various viewpoints. Jointly, these factors create very complex appearance patterns that are hard to model and detect well. In order to deal with the large appearance variability, we propose an approach based on the pictorial structures paradigm in which we represent the human body as a flexible configuration of rigid body parts and model the appearance of each body part using local image descriptors and discriminative classifiers. We demonstrate the generality of our approach by successfully applying it to various human detection and pose estimation problems.
One of the goals of this work is to demonstrate the advantages of a tight coupling of people detection, pose estimation and tracking. Tracking of people in uncontrolled conditions is difficult not only due to appearance variability, but also to frequent full and partial occlusions, which often happen when multiple people are present in the scene. Presence of multiple people also severely complicates data association between frames of the sequence. In order to address this challenge, we propose a tracking-by-detection framework that combines evidence from single-frame detections over several subsequent frames using a dynamical model of body articulations. We demonstrate the effectiveness of our tracking-by-detection approach by applying it to the problem of monocular 3D pose estimation of people in uncontrolled street environments.
|Place of Publication:||Darmstadt|
|Uncontrolled Keywords:||Bildverarbeitung, Menschenerkennung, Haltungsschätzung|
|Classification DDC:||000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik|
|Divisions:||20 Department of Computer Science|
|Date Deposited:||19 Oct 2011 14:58|
|Last Modified:||07 Dec 2012 12:01|
|Referees:||Roth, Prof. Stefan and Huttenlocher, Prof. Daniel and Schiele, Prof. Bernt|
|Refereed:||22 October 2010|