We present a novel online depth map fusion approach that learns depth map aggregation in a latent feature space. While previous fusion methods use an explicit scene representation like signed distance functions (SDFs), we propose a learned feature representation for the fusion. The key idea is a separation between the scene representation used for the fusion and the output scene representation, via an additional translator network. Our neural network architecture consists of two main parts: a depth and feature fusion sub-network, which is followed by a translator sub-network to produce the final surface representation (e.g. TSDF) for visualization or other tasks. Our approach is real-time capable, handles high noise levels, and is particularly able to deal with gross outliers common for photometric stereo-based depth maps. Experiments on real and synthetic data demonstrate improved results compared to the state of the art, especially in challenging scenarios with large amounts of noise and outliers.
翻译:我们展示了一种新颖的在线深度地图聚合方法,在潜在特征空间学习深度地图聚合。 虽然以前的聚合方法使用明显的场景表示方式, 比如签名的距离函数( SDFs), 我们为聚合提出一个学到的特征表示方式。 关键的想法是通过额外的翻译网络将用于聚合的场景表示方式和输出场表示方式区分开来。 我们的神经网络结构由两个主要部分组成: 深度和特征融合子网络, 之后是一个翻译子网络, 以产生可视化或其他任务的最后表面表示方式( 如 TSDF) 。 我们的方法是实时功能, 处理高噪音水平, 并且特别能够处理光度立体立体深度地图上常见的毛外值。 对真实和合成数据的实验表明, 与艺术状态相比, 特别是具有大量噪音和外在的富有挑战性的情景中, 实际和合成数据实验表明结果有所改善。