In recent years, monocular depth estimation is applied to understand the surrounding 3D environment and has made great progress. However, there is an ill-posed problem on how to gain depth information directly from a single image. With the rapid development of deep learning, this problem is possible to be solved. Although more and more approaches are proposed one after another, most of existing methods inevitably lost details due to continuous downsampling when mapping from RGB space to depth space. To the end, we design a Multi-scale Features Network (MSFNet), which consists of Enhanced Diverse Attention (EDA) module and Upsample-Stage Fusion (USF) module. The EDA module employs the spatial attention method to learn significant spatial information, while USF module complements low-level detail information with high-level semantic information from the perspective of multi-scale feature fusion to improve the predicted effect. In addition, since the simple samples are always trained to a better effect first, the hard samples are difficult to converge. Therefore, we design a batch-loss to assign large loss factors to the harder samples in a batch. Experiments on NYU-Depth V2 dataset and KITTI dataset demonstrate that our proposed approach is more competitive with the state-of-the-art methods in both qualitative and quantitative evaluation.
翻译:近年来,单体深度估计用于了解周围的3D环境,并取得了巨大进展。然而,在如何直接从一个图像中获取深度信息方面存在着一个问题。随着深层学习的迅速发展,这一问题有可能得到解决。虽然提出了越来越多的方法,但大多数现有方法不可避免地会丢失细节,因为从RGB空间到深层空间的测绘工作不断缩小抽样。最后,我们设计了一个多尺度地物网络(MSFNet),其中包括增强多样性模块和高模集成模块。EDA模块使用空间关注方法学习重要的空间信息,而USF模块则从多尺度地物融合的角度以高层次的语义信息补充低层次的详细信息,以提高预测的效果。此外,由于简单的样本总是经过培训,因此很难将硬样品集中起来。因此,我们设计了一批损失,将大的损失因素分配给较难的样本,以分批方式学习重要的空间信息,而USF模块则用低层次的精度信息补充了低层次的详细信息,从多尺度的特征融合角度来改进预测的效果。此外,由于简单样品总是经过更好的训练,因此难以集中。因此,我们设计了一组比较具有竞争力的KDI-DSDIS2号数据模型的实验,同时展示了我们较具有竞争力的数据。