Hur, Junhwa (2022)
Joint Motion, Semantic Segmentation, Occlusion, and Depth Estimation.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00021624
Ph.D. Thesis, Primary publication, Publisher's Version
Text
junhwa_hur_phd_dissertation.pdf Copyright Information: CC BY-SA 4.0 International - Creative Commons, Attribution ShareAlike. Download (32MB) |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Joint Motion, Semantic Segmentation, Occlusion, and Depth Estimation | ||||
Language: | English | ||||
Referees: | Roth, Prof. Ph.D Stefan ; Ramanan, Prof. Ph.D Deva | ||||
Date: | 2022 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | xviii, 154 Seiten | ||||
Date of oral examination: | 18 May 2022 | ||||
DOI: | 10.26083/tuprints-00021624 | ||||
Abstract: | Visual scene understanding is one of the most important components of autonomous navigation. It includes multiple computer vision tasks such as recognizing objects, perceiving their 3D structure, and analyzing their motion, all of which have gone through remarkable progress over the recent years. However, most of the earlier studies have explored these components individually, and thus potential benefits from exploiting the relationship between them have been overlooked. In this dissertation, we explore what kind of relationship the tasks can present, along with the potential benefits that could be discovered from jointly formulating multiple tasks. The joint formulation allows each task to exploit the other task as an additional input cue and eventually improves the accuracy of the joint tasks. We first present the joint estimation of semantic segmentation and optical flow. Though not directly related, the tasks provide an important cue to each other in the temporal domain. Semantic information can provide information on plausible physical motion of its associated pixels, and accurate pixel-level temporal correspondences enhance the temporal consistency of semantic segmentation. We demonstrate that the joint formulation improves the accuracy of both tasks. Second, we investigate the mutual relationship between optical flow and occlusion estimation. Unlike most previous methods considering occlusions as outliers, we highlight the importance of jointly reasoning the two tasks in the optimization. Specifically through utilizing forward-backward consistency and occlusion-disocclusion symmetry in the energy, we demonstrate that the joint formulation brings substantial performance benefits for both tasks on standard benchmarks. We further demonstrate that optical flow and occlusion can exploit their mutual relationship in Convolutional Neural Network as well. We propose to iteratively and residually refine the estimates using a single weight-shared network, which substantially improves the accuracy without adding network parameters or even reducing them depending on the backbone networks. Next, we propose a joint depth and 3D scene flow estimation from only two temporally consecutive monocular images. We solve this ill-posed problem by taking an inverse problem view. We design a single Convolutional Neural Network that simultaneously estimates depth and 3D motion from a classical optical flow cost volume. With self-supervised learning, we leverage unlabeled data for training, without concerns about the shortage of 3D annotation for direct supervision. Finally, we conclude by summarizing the contributions and discussing future perspectives that can resolve current challenges our approaches have. |
||||
Alternative Abstract: |
|
||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-216242 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science > Visual Inference | ||||
Date Deposited: | 21 Jul 2022 12:15 | ||||
Last Modified: | 16 Dec 2022 07:34 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/21624 | ||||
PPN: | 497916274 | ||||
Export: |
View Item |