DENAO: Monocular Depth Estimation Network With Auxiliary Optical Flow

IEEE Trans Pattern Anal Mach Intell. 2021 Aug;43(8):2598-2610. doi: 10.1109/TPAMI.2020.2977021. Epub 2021 Jul 1.

Abstract

Estimating depth from multi-view images captured by a localized monocular camera is an essential task in computer vision and robotics. In this study, we demonstrate that learning a convolutional neural network (CNN) for depth estimation with an auxiliary optical flow network and the epipolar geometry constraint can greatly benefit the depth estimation task and in turn yield large improvements in both accuracy and speed. Our architecture is composed of two tightly-coupled encoder-decoder networks, i.e., an optical flow net and a depth net, the core part being a list of exchange blocks between the two nets and an epipolar feature layer in the optical flow net to improve predictions of both depth and optical flow. Our architecture allows to input arbitrary number of multiview images with a linearly growing time cost for optical flow and depth estimation. Experimental result on five public datasets demonstrates that our method, named DENAO, runs at 38.46fps on a single Nvidia TITAN Xp GPU which is 5.15X ∼ 142X faster than the state-of-the-art depth estimation methods Meanwhile, our DENAO can concurrently output predictions of both depth and optical flow, and performs on par with or outperforms the state-of-the-art depth estimation methods and optical flow methods.