The simplicity of the visual servoing approach makes it an attractive option for tasks dealing with vision-based control of robots in many real-world applications. However, attaining precise alignment for unseen environments pose a challenge to existing visual servoing approaches. While classical approaches assume a perfect world, the recent data-driven approaches face issues when generalizing to novel environments. In this paper, we aim to combine the best of both worlds. We present a deep model predictive visual servoing framework that can achieve precise alignment with optimal trajectories and can generalize to novel environments. Our framework consists of a deep network for optical flow predictions, which are used along with a predictive model to forecast future optical flow. For generating an optimal set of velocities we present a control network that can be trained on the fly without any supervision. Through extensive simulations on photo-realistic indoor settings of the popular Habitat framework, we show significant performance gain due to the proposed formulation vis-a-vis recent state-of-the-art methods. Specifically, we show a faster convergence and an improved performance in trajectory length over recent approaches.
翻译:视觉模拟方法的简单性使得它成为处理许多现实应用中机器人的视觉控制任务的一个有吸引力的选择。然而,对看不见环境进行精确的调整对现有的视觉模拟方法提出了挑战。传统方法假定了一个完美的世界,而最近的数据驱动方法则在向新环境概括时面临问题。在本文中,我们的目标是将两个世界的最好因素结合起来。我们提出了一个深层次的模型预测视觉模拟框架,可以与最佳轨道精确匹配,并可以概括到新的环境。我们的框架包括一个光学流预测的深度网络,与预测未来光学流动的预测模型一起使用。为了产生一套最佳的节奏,我们提出的控制网络可以在没有监督的情况下进行飞行培训。通过对流行的生境框架的光现实室内环境进行广泛的模拟,我们显示出由于拟议的设计与最近的先进方法相比,我们取得了显著的成绩。具体地说,我们展示了一种更快的趋同,并改进了近期方法的轨迹长度。