We present a novel online depth map fusion approach that learns depth map aggregation in a latent feature space. While previous fusion methods use an explicit scene representation like signed distance functions (SDFs), we propose a learned feature representation for the fusion. The key idea is a separation between the scene representation used for the fusion and the output scene representation, via an additional translator network. Our neural network architecture consists of two main parts: a depth and feature fusion sub-network, which is followed by a translator sub-network to produce the final surface representation (e.g. TSDF) for visualization or other tasks. Our approach is an online process, handles high noise levels, and is particularly able to deal with gross outliers common for photometric stereo-based depth maps. Experiments on real and synthetic data demonstrate improved results compared to the state of the art, especially in challenging scenarios with large amounts of noise and outliers.
翻译:我们展示了一种新颖的在线深度地图聚合方法,在潜在特征空间学习深度地图聚合。 虽然以前的聚合方法使用一个清晰的场景代表, 比如签名的距离函数( SDFs ), 我们为聚合提出一个学习的特征代表。 关键的想法是通过额外的翻译网络将用于聚合的场景代表与输出场代表区分开来。 我们的神经网络结构由两个主要部分组成: 一个深度和特征融合子网络, 之后是一个翻译子网络, 以产生可视化或其他任务的最后表面代表( 如 TSDF ) 。 我们的方法是一个在线过程, 处理高噪音水平, 并且特别能够处理光度立体立体深度地图常见的毛外值。 对真实和合成数据的实验显示, 与艺术状态相比, 特别是在充满大量噪音和外部的富有挑战性的情况下, 效果会得到改善 。