One of the challenging input settings for visual servoing is when the initial and goal camera views are far apart. Such settings are difficult because the wide baseline can cause drastic changes in object appearance and cause occlusions. This paper presents a novel self-supervised visual servoing method for wide baseline images which does not require 3D ground truth supervision. Existing approaches that regress absolute camera pose with respect to an object require 3D ground truth data of the object in the forms of 3D bounding boxes or meshes. We learn a coherent visual representation by leveraging a geometric property called 3D equivariance-the representation is transformed in a predictable way as a function of 3D transformation. To ensure that the feature-space is faithful to the underlying geodesic space, a geodesic preserving constraint is applied in conjunction with the equivariance. We design a Siamese network that can effectively enforce these two geometric properties without requiring 3D supervision. With the learned model, the relative transformation can be inferred simply by following the gradient in the learned space and used as feedback for closed-loop visual servoing. Our method is evaluated on objects from the YCB dataset, showing meaningful outperformance on a visual servoing task, or object alignment task with respect to state-of-the-art approaches that use 3D supervision. Ours yields more than 35% average distance error reduction and more than 90% success rate with 3cm error tolerance.
翻译:视觉搜索的具有挑战性的输入设置之一是当初始和目标相机视图相形不同时。 这样的设置是困难的, 因为宽的基线可以导致对象外观的急剧变化并导致分解。 本文为宽基线图像提供了一个新型的自我监督的视觉筛选方法, 不需要 3D 地面真伪监督。 对一个对象来说, 反向绝对相机构成的现有方法需要3D 捆绑框或 meshes 形式的3D 对象的地面真象数据。 我们通过利用一个称为 3D 等离差的几何属性来学习一致的直观表示。 3D 变异性表示以可预测的方式转换为 3D 变异。 为确保地貌空间忠实于基本的大地测量空间, 与微变异性一起应用一个大地测量保存限制。 我们设计一个可以不需 3D 监督的Siamees 网络, 仅仅通过学习的渐变来推断相对的相对变化, 并用作闭路镜显示的直径定位的直径直径显示方式, 3D 的直径比对等目标的直径校正 3 。 我们用的方法用的直径校正法比YC 的校正 校正 校正 3 显示比用的直径校正 校正 校正 校正 校正 。