iDisc: 内部离散化用于单目深度估计 (iDisc: Internal Discretization for Monocular Depth Estimation)

Monocular depth estimation is fundamental for 3D scene understanding and downstream applications. However, even under the supervised setup, it is still challenging and ill-posed due to the lack of full geometric constraints. Although a scene can consist of millions of pixels, there are fewer high-level patterns. We propose iDisc to learn those patterns with internal discretized representations. The method implicitly partitions the scene into a set of high-level patterns. In particular, our new module, Internal Discretization (ID), implements a continuous-discrete-continuous bottleneck to learn those concepts without supervision. In contrast to state-of-the-art methods, the proposed model does not enforce any explicit constraints or priors on the depth output. The whole network with the ID module can be trained end-to-end, thanks to the bottleneck module based on attention. Our method sets the new state of the art with significant improvements on NYU-Depth v2 and KITTI, outperforming all published methods on the official KITTI benchmark. iDisc can also achieve state-of-the-art results on surface normal estimation. Further, we explore the model generalization capability via zero-shot testing. We observe the compelling need to promote diversification in the outdoor scenario. Hence, we introduce splits of two autonomous driving datasets, DDAD and Argoverse. Code is available at http://vis.xyz/pub/idisc .

翻译：单目深度估计对于三维场景理解和下游应用至关重要。但是，即使在监督设置下，由于缺乏完整的几何约束，仍然具有挑战性和不适定性。尽管场景可以由数百万像素组成，但还是有更少的高级模式。我们提出了iDisc，用于利用内部离散化表示学习这些模式。该方法隐式地将场景划分为一组高级模式。具体而言，我们的新模块，内部离散化(ID)，实现了连续-离散-连续的瓶颈，以无监督的方式学习这些概念。与现有技术方法不同，所提出的模型不对深度输出施加任何显式约束或先验知识。整个网络与ID模块可以进行端到端的训练，这要归功于基于注意的瓶颈模块。我们的方法在NYU-Depth v2 和KITTI上取得了显著的改进，并且超过了官方KITTI基准测试中发表的所有方法，创造了新的最先进的水平。iDisc还可以在表面法线估计方面实现最先进的结果。此外，我们通过零样本测试探索了模型的泛化能力。我们发现有必要在室外场景中促进多样化。因此，我们介绍了DDAD和Argoverse两个自动驾驶数据集的分裂。代码可在http://vis.xyz/pub/idisc上获得。