We introduce CGA-PoseNet, which uses the 1D-Up approach to Conformal Geometric Algebra (CGA) to represent rotations and translations with a single mathematical object, the motor, for camera pose regression. We do so starting from PoseNet, which successfully predicts camera poses from small datasets of RGB frames. State-of-the-art methods, however, require expensive tuning to balance the orientational and translational components of the camera pose.This is usually done through complex, ad-hoc loss function to be minimized, and in some cases also requires 3D points as well as images. Our approach has the advantage of unifying the camera position and orientation through the motor. Consequently, the network searches for a single object which lives in a well-behaved 4D space with a Euclidean signature. This means that we can address the case of image-only datasets and work efficiently with a simple loss function, namely the mean squared error (MSE) between the predicted and ground truth motors. We show that it is possible to achieve high accuracy camera pose regression with a significantly simpler problem formulation. This 1D-Up approach to CGA can be employed to overcome the dichotomy between translational and orientational components in camera pose regression in a compact and elegant way.
翻译:我们引入了CGA-PoseNet(CGA-PoseNet), 使用 1D-Up 方法来代表一个数学对象即发动机的旋转和翻译(CGA), 用于显示相机的回归。 我们从PoseNet(它成功地预测了摄像机由 RGB 框架的小型数据集构成的摄像头。 然而, 最先进的方法需要昂贵的调试, 以平衡摄像头的定向和翻译部分。 这通常是通过复杂、 临时的丢失功能来完成的, 在某些情况下, 还需要3D 点和图像。 我们的方法的优点是, 将摄像器的位置和方向统一起来。 因此, 我们从PoseNet( 它成功地预测的4D 4D 空间中以Euclidean 的签名运行的单个对象进行搜索。 这意味着我们可以用简单的丢失功能来处理只使用图像的数据集的案例, 并高效地工作, 即将预测的和地面的真相发动机之间的平均正方差错误( MSE) 。 我们表明, 我们有可能实现高精度摄像头摄像头的回归,, 以大大地平化的平整为CDGA 。