3D shape reconstruction from a single image has been a long-standing problem in computer vision. Recent advances have led to 3D representation learning, wherein pixel-aligned 3D reconstruction methods show impressive performance. However, it is normally hard to exploit meaningful local image features to describe 3D point samplings from the aligned pixels when large variations of occlusions, views, and appearances exist. In this paper, we study a general kernel to encode local image features with considering geometric relationships of point samplings from the underlying surfaces. The kernel is derived from the proposed spatial pattern, in a way the kernel points are obtained as the 2D projections of a number of 3D pattern points around a sampling. Supported by the spatial pattern, the 2D kernel encodes geometric information that is essential for 3D reconstruction tasks, while traditional 2D kernels mainly consider appearance information. Furthermore, to enable the network to discover more adaptive spatial patterns for further capturing non-local contextual information, the spatial pattern is devised to be deformable. Experimental results on both synthetic datasets and real datasets demonstrate the superiority of the proposed method.
翻译:从单一图像重建 3D 形状一直是计算机视觉中长期存在的一个问题。 最近的进步导致3D 代表学习, 其中像素匹配 3D 重建方法显示了令人印象深刻的性能。 然而,通常很难利用有意义的本地图像特征来描述对齐像素的3D点抽样, 因为在3D 重建任务中存在巨大的分层、 视图和外观的变异。 在本文中, 我们研究一个一般内核, 将本地图像特征编码为考虑到从底层取样的几何关系。 内核来自拟议的空间模式, 以获得2D 样点作为取样周围若干 3D 模式点的预测。 在空间模式的支持下, 2D 内核编码了对 3D 重建任务至关重要的几何信息, 而传统的 2D 内核主要考虑外观信息 。 此外, 为使网络能够发现更适应性更强的空间模式, 以进一步获取非本地背景信息, 空间模式的设计是可变的。 合成数据集和真实数据集的实验结果显示了拟议方法的优越性。