Self-supervised monocular depth estimation has received much attention recently in computer vision. Most of the existing works in literature aggregate multi-scale features for depth prediction via either straightforward concatenation or element-wise addition, however, such feature aggregation operations generally neglect the contextual consistency between multi-scale features. Addressing this problem, we propose the Self-Distilled Feature Aggregation (SDFA) module for simultaneously aggregating a pair of low-scale and high-scale features and maintaining their contextual consistency. The SDFA employs three branches to learn three feature offset maps respectively: one offset map for refining the input low-scale feature and the other two for refining the input high-scale feature under a designed self-distillation manner. Then, we propose an SDFA-based network for self-supervised monocular depth estimation, and design a self-distilled training strategy to train the proposed network with the SDFA module. Experimental results on the KITTI dataset demonstrate that the proposed method outperforms the comparative state-of-the-art methods in most cases. The code is available at https://github.com/ZM-Zhou/SDFA-Net_pytorch.
翻译:在计算机视野中,自监督的单体深度估计最近受到了很多注意。文献中现有的多数作品通过直截了当的混凝土或元素添加将深度预测的多尺度特征汇总在一起,然而,这种特征汇总作业通常忽视了多尺度特征之间的背景一致性。针对这一问题,我们提议采用自改进的地貌聚合模块,同时将一对低尺度和高尺度的特征集合起来,并保持其背景一致性。SDFA使用三个分支分别学习三个特征抵消图:一个用于改进输入的低尺度特征的抵消图,另外两个用于在设计自我蒸馏方式下改进输入的高尺度特征。然后,我们提议基于SDFA的网络进行自我监督的单体深度估计,并设计一个自改进的培训战略,用SDFA模块对拟议网络进行培训。KITTI数据集的实验结果表明,拟议的方法在多数情况下都超越了比较的状态方法。该代码可在 https://github.com/Z-Z-Z-HO-FART-SVERch.