3D inference from monocular vision using neural networks is an important research area of computer vision. Applications of the research area are various with many proposed solutions and have shown remarkable performance. Although many efforts have been invested, there are still unanswered questions, some of which are fundamental. In this paper, I discuss a problem that I hope will come to be known as a generalization of the Blind Perspective-n-Point (Blind PnP) problem for object-driven 3D inference based on 2D representations. The vital difference between the fundamental problem and the Blind PnP problem is that 3D inference parameters in the fundamental problem are attached directly to 3D points and the camera concept will be represented through the sharing of the parameters of these points. By providing an explainable and robust gradient-decent solution based on 2D representations for an important special case of the problem, the paper opens up a new approach for using available information-based learning methods to solve problems related to 3D object pose estimation from 2D images.
翻译:从使用神经网络的单眼视觉中得出的3D推论是计算机视觉的一个重要研究领域。研究领域的应用多种多样,有许多建议的解决办法,并表现出显著的绩效。虽然已经投入了许多努力,但仍有一些未回答的问题,其中一些是根本性的。在本文件中,我讨论了一个我希望人们会称之为基于 2D 表达方式的基于目标驱动的3D推论的“盲人-视-点(Blind PnP)”问题。基本问题与盲人PnP问题之间的重要区别是,基本问题中的3D推论参数直接附在3D 点上,而相机概念将通过分享这些点的参数来体现。通过对问题的一个重要特殊案例提供基于 2D 的2D 表达方式的、可解释的、稳健健的梯度-点解决方案,该文件开启了一种新办法,利用现有基于信息的学习方法解决与3D 对象有关的问题,从 2D 图像中作出估计。