Multi-view projection methods have demonstrated their ability to reach state-of-the-art performance on 3D shape recognition. Those methods learn different ways to aggregate information from multiple views. However, the camera view-points for those views tend to be heuristically set and fixed for all shapes. To circumvent the lack of dynamism of current multi-view methods, we propose to learn those view-points. In particular, we introduce the Multi-View Transformation Network (MVTN) that regresses optimal view-points for 3D shape recognition, building upon advances in differentiable rendering. As a result, MVTN can be trained end-to-end along with any multi-view network for 3D shape classification. We integrate MVTN in a novel adaptive multi-view pipeline that can render either 3D meshes or point clouds. MVTN exhibits clear performance gains in the tasks of 3D shape classification and 3D shape retrieval without the need for extra training supervision. In these tasks, MVTN achieves state-of-the-art performance on ModelNet40, ShapeNet Core55, and the most recent and realistic ScanObjectNN dataset (up to 6% improvement). Interestingly, we also show that MVTN can provide network robustness against rotation and occlusion in the 3D domain. The code is available at https://github.com/ajhamdi/MVTN .
翻译:多视图投影方法显示它们有能力在3D形状识别上达到最先进的业绩。这些方法学习了从多种观点收集信息的不同方法。 但是,这些观点的摄像查看点往往都是超常设置的,而且对所有形状都有固定的。为避免当前多视图方法缺乏活力的情况,我们提议学习这些观点点。特别是,我们引入多视图转换网络(MVTN),这种多视图转换网络在可变变化的进展基础上,回归了3D形状识别的最佳观察点。因此,MVTN可以与任何3D形状分类的多视图网络一起接受端对端培训。我们将MVTN纳入一个全新的适应性多视图管道,既能提供3D模版模版或点云。MVTNL在3D形状分类和3D形状检索方面明显取得了业绩收益,而无需额外的培训监督。在这些任务中,MVTNTN在模型40、 ShapeNetCreet55和最现实的ScampONVD网络上可以提供可靠的升级数据。