When manipulating a novel object with complex dynamics, a state representation is not always available, for example for deformable objects. Learning both a representation and dynamics from observations requires large amounts of data. We propose Learned Visual Similarity Predictive Control (LVSPC), a novel method for data-efficient learning to control systems with complex dynamics and high-dimensional state spaces from images. LVSPC leverages a given simple model approximation from which image observations can be generated. We use these images to train a perception model that estimates the simple model state from observations of the complex system online. We then use data from the complex system to fit the parameters of the simple model and learn where this model is inaccurate, also online. Finally, we use Model Predictive Control and bias the controller away from regions where the simple model is inaccurate and thus where the controller is less reliable. We evaluate LVSPC on two tasks; manipulating a tethered mass and a rope. We find that our method performs comparably to state-of-the-art reinforcement learning methods with an order of magnitude less data. LVSPC also completes the rope manipulation task on a real robot with 80% success rate after only 10 trials, despite using a perception system trained only on images from simulation.
翻译:当操控一个具有复杂动态的新事物时,并不总是有州代表,例如对于可变物体。从观测中学习一个代表和动态需要大量的数据。我们提议使用“视觉相似性预测控制”(LVSPC),这是数据高效学习的新方法,以控制具有复杂动态和高维状态图像的空间的系统。LVSPC利用一个特定简单的模型近似,从中生成图像观测。我们用这些图像来训练一个感知模型,根据对复杂系统的在线观测来估计简单的模型状态。我们然后使用来自复杂系统的数据来适应简单模型的参数,并学习该模型不准确之处,也是在线的。最后,我们使用模型预测控制器,将控制器从简单模型不准确因而控制器不那么可靠的区域偏向控制器。我们用两个任务来评价LVSPC;操纵连接质量和绳子。我们发现,我们的方法比状态的强化学习方法具有可比较性,用数量较少的数据排序。我们用复杂的系统的数据来完成实际机器人的绳子操纵任务,尽管经过了10次的模拟,但只使用经过训练的成功率。