Single image pose estimation is a fundamental problem in many vision and robotics tasks, and existing deep learning approaches suffer by not completely modeling and handling: i) uncertainty about the predictions, and ii) symmetric objects with multiple (sometimes infinite) correct poses. To this end, we introduce a method to estimate arbitrary, non-parametric distributions on SO(3). Our key idea is to represent the distributions implicitly, with a neural network that estimates the probability given the input image and a candidate pose. Grid sampling or gradient ascent can be used to find the most likely pose, but it is also possible to evaluate the probability at any pose, enabling reasoning about symmetries and uncertainty. This is the most general way of representing distributions on manifolds, and to showcase the rich expressive power, we introduce a dataset of challenging symmetric and nearly-symmetric objects. We require no supervision on pose uncertainty -- the model trains only with a single pose per example. Nonetheless, our implicit model is highly expressive to handle complex distributions over 3D poses, while still obtaining accurate pose estimation on standard non-ambiguous environments, achieving state-of-the-art performance on Pascal3D+ and ModelNet10-SO(3) benchmarks.
翻译:单个图像估计是许多视觉和机器人任务中的一个基本问题,现有深层学习方法因不完全建模和处理而受到影响:(一) 预测的不确定性,以及(二) 对称对象,具有多重(有时无限)正确外形。为此,我们引入了一种方法,对SO(3)的任意、非参数分布进行估算。我们的关键想法是隐含地代表分布,而神经网络只能根据输入图像和候选图像来估计概率。网格取样或梯度,可以用来发现最可能构成的概率,但也可以评估任何可能的概率,使关于对称和不确定性的推理成为可能。这是代表多个(有时无限)正态外形分布的最一般方式,并展示丰富的直观力量。我们引入了具有挑战性的对称和近似对称对象分布的数据集。我们不需要对不确定性进行监督,模型训练只用单一的表象来显示。然而,我们隐含的模型模型非常明确,可以处理3D构成的复杂分布,同时仍然对标准非模糊的模型环境进行准确的估测度估计,同时实现状态和SO-3基准。