Predicting the pose of objects from a single image is an important but difficult computer vision problem. Methods that predict a single point estimate do not predict the pose of objects with symmetries well and cannot represent uncertainty. Alternatively, some works predict a distribution over orientations in $\mathrm{SO}(3)$. However, training such models can be computation- and sample-inefficient. Instead, we propose a novel mapping of features from the image domain to the 3D rotation manifold. Our method then leverages $\mathrm{SO}(3)$ equivariant layers, which are more sample efficient, and outputs a distribution over rotations that can be sampled at arbitrary resolution. We demonstrate the effectiveness of our method at object orientation prediction, and achieve state-of-the-art performance on the popular PASCAL3D+ dataset. Moreover, we show that our method can model complex object symmetries, without any modifications to the parameters or loss function. Code is available at https://dmklee.github.io/image2sphere.
翻译:从单一图像中预测对象的形状是一个重要但困难的计算机视觉问题。 预测单一点估计的方法并不预测具有对称性的物体的形状, 也不能代表不确定性。 或者, 有些作品预测方向分布为$\ mathrm{SO}(3)美元。 但是, 培训这些模型可以进行计算和抽样效率不高。 相反, 我们提议对从图像域到三维旋转体的特征进行新的绘图。 我们的方法随后会利用美元( mathrm{SO}(3)) 等同层( 美元), 后者的样本效率更高, 并输出对旋转的分布, 可以任意解析。 我们展示了我们在目标方向预测中的方法的有效性, 并在流行的 PSCAL3D+数据集中实现最先进的性能。 此外, 我们显示我们的方法可以建模复杂的对象组合, 而不修改参数或损失功能。 代码可在 https://dmklee. githhub.io/image2sphier 上查阅 。</s>