Structure from Articulated Motion: Accurate and Stable Monocular 3D Reconstruction without Training Data

Onorina Kovalenko; Vladislav Golyanik; Jameel Malik; Ahmed Elhayek; Didier Stricker

doi:10.3390/s19204603

Structure from Articulated Motion: Accurate and Stable Monocular 3D Reconstruction without Training Data

Sensors (Basel). 2019 Oct 22;19(20):4603. doi: 10.3390/s19204603.

Authors

Onorina Kovalenko¹, Vladislav Golyanik², Jameel Malik^{3

4

5}, Ahmed Elhayek^{6

7}, Didier Stricker^{8

9}

Affiliations

¹ Department Augmented Vision, German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany. [email protected].
² Department of Computer Graphics, Max Planck Institute for Informatics, 66123 Saarbrücken, Germany. [email protected].
³ Department Augmented Vision, German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany. [email protected].
⁴ Department of Computer Science, University of Kaiserslautern, 67663 Kaiserslautern, Germany. [email protected].
⁵ School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), 44000 Islamabad, Pakistan. [email protected].
⁶ Department Augmented Vision, German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany. [email protected].
⁷ Department of Computer Science, University of Prince Mugrin (UPM), 20012 Madinah, Saudi Arabia. [email protected].
⁸ Department Augmented Vision, German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany. [email protected].
⁹ Department of Computer Science, University of Kaiserslautern, 67663 Kaiserslautern, Germany. [email protected].

Abstract

Recovery of articulated 3D structure from 2D observations is a challenging computer vision problem with many applications. Current learning-based approaches achieve state-of-the-art accuracy on public benchmarks but are restricted to specific types of objects and motions covered by the training datasets. Model-based approaches do not rely on training data but show lower accuracy on these datasets. In this paper, we introduce a model-based method called Structure from Articulated Motion (SfAM), which can recover multiple object and motion types without training on extensive data collections. At the same time, it performs on par with learning-based state-of-the-art approaches on public benchmarks and outperforms previous non-rigid structure from motion (NRSfM) methods. SfAM is built upon a general-purpose NRSfM technique while integrating a soft spatio-temporal constraint on the bone lengths. We use alternating optimization strategy to recover optimal geometry (i.e., bone proportions) together with 3D joint positions by enforcing the bone lengths consistency over a series of frames. SfAM is highly robust to noisy 2D annotations, generalizes to arbitrary objects and does not rely on training data, which is shown in extensive experiments on public benchmarks and real video sequences. We believe that it brings a new perspective on the domain of monocular 3D recovery of articulated structures, including human motion capture.

Keywords: articulated structure recovery; human pose estimation; structure from motion.

Grants and funding

01IW18002/Bundesministerium für Bildung und Forschung