Learning 3D representations by fusing point cloud and multi-view data has been proven to be fairly effective. While prior works typically focus on exploiting global features of the two modalities, in this paper we argue that more discriminative features can be derived by modeling "where to fuse". To investigate this, we propose a novel Correspondence-Aware Point-view Fusion Net (CAPNet). The core element of CAP-Net is a module named Correspondence-Aware Fusion (CAF) which integrates the local features of the two modalities based on their correspondence scores. We further propose to filter out correspondence scores with low values to obtain salient local correspondences, which reduces redundancy for the fusion process. In our CAP-Net, we utilize the CAF modules to fuse the multi-scale features of the two modalities both bidirectionally and hierarchically in order to obtain more informative features. Comprehensive evaluations on popular 3D shape benchmarks covering 3D object classification and retrieval show the superiority of the proposed framework.
翻译:通过发泡点云和多视图数据进行学习的3D表达方式已证明相当有效。虽然先前的工作通常侧重于利用两种模式的全球特征,但我们在本文中认为,通过“连接点”的建模可以产生更多的歧视性特征。为了调查这一点,我们提议建立一个新型的通信-软件点视图融合网(CAPNet),CAP-Net的核心元素是一个名为“通信-软件融合”的模块,该模块根据通信分数整合两种模式的本地特征。我们进一步建议过滤低值的通信分数,以获得显著的本地通信,从而减少聚合过程的冗余。在我们CAP-Net中,我们利用CAF模块将两种模式的多尺度特征双向和分级连接起来,以获取更多信息。对涵盖3D对象分类和检索的通用3D形状基准的全面评价显示了拟议框架的优越性。