Rehfeld, Timo (2018)
Combining Appearance, Depth and Motion for Efficient Semantic Scene Understanding.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication
|
Text
dissertation_timo_rehfeld_final_a4_color_refs_march10_2018_small.pdf - Accepted Version Copyright Information: CC BY-NC 4.0 International - Creative Commons, Attribution NonCommercial. Download (7MB) | Preview |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Combining Appearance, Depth and Motion for Efficient Semantic Scene Understanding | ||||
Language: | English | ||||
Referees: | Roth, Prof. Dr. Stefan ; Rother, Prof. Dr. Carsten | ||||
Date: | 2018 | ||||
Place of Publication: | Darmstadt | ||||
Date of oral examination: | 26 September 2017 | ||||
Abstract: | Computer vision plays a central role in autonomous vehicle technology, because cameras are comparably cheap and capture rich information about the environment. In particular, object classes, i.e. whether a certain object is a pedestrian, cyclist or vehicle can be extracted very well based on image data. Environment perception in urban city centers is a highly challenging computer vision problem, as the environment is very complex and cluttered: road boundaries and markings, traffic signs and lights and many different kinds of objects that can mutually occlude each other need to be detected in real-time. Existing automotive vision systems do not easily scale to these requirements, because every problem or object class is treated independently. Scene labeling on the other hand, which assigns object class information to every pixel in the image, is the most promising approach to avoid this overhead by sharing extracted features across multiple classes. Compared to bounding box detectors, scene labeling additionally provides richer and denser information about the environment. However, most existing scene labeling methods require a large amount of computational resources, which makes them infeasible for real-time in-vehicle applications. In addition, in terms of bandwidth, a dense pixel-level representation is not ideal to transmit the perceived environment to other modules of an autonomous vehicle, such as localization or path planning. This dissertation addresses the scene labeling problem in an automotive context by constructing a scene labeling concept around the "Stixel World" model of Pfeiffer (2011), which compresses dense information about the environment into a set of small "sticks" that stand upright, perpendicular to the ground plane. This work provides the first extension of the existing Stixel formulation that takes into account learned dense pixel-level appearance features. In a second step, Stixels are used as primitive scene elements to build a highly efficient region-level labeling scheme. The last part of this dissertation finally proposes a model that combines both pixel-level and region-level scene labeling into a single model that yields state-of-the-art or better labeling accuracy and can be executed in real-time with typical camera refresh rates. This work further investigates how existing depth information, i.e. from a stereo camera, can help to improve labeling accuracy and reduce runtime. |
||||
Alternative Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-73155 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science 20 Department of Computer Science > Visual Inference |
||||
Date Deposited: | 26 Apr 2018 07:26 | ||||
Last Modified: | 09 Jul 2020 02:03 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/7315 | ||||
PPN: | 428835694 | ||||
Export: |
View Item |