Learning an accurate model of the environment is essential for model-based control tasks. Existing methods in robotic visuomotor control usually learn from data with heavily labelled actions, object entities or locations, which can be demanding in many cases. To cope with this limitation, we propose a method, dubbed DMotion, that trains a forward model from video data only, via disentangling the motion of controllable agent to model the transition dynamics. An object extractor and an interaction learner are trained in an end-to-end manner without supervision. The agent's motions are explicitly represented using spatial transformation matrices containing physical meanings. In the experiments, DMotion achieves superior performance on learning an accurate forward model in a Grid World environment, as well as a more realistic robot control environment in simulation. With the accurate learned forward models, we further demonstrate their usage in model predictive control as an effective approach for robotic manipulations.
翻译:以模型为基础的控制任务必须学习准确的环境模型。 机器人比武机控制的现有方法通常从带有高标签动作、物体实体或地点的数据中学习,在许多情况下,这些数据要求很高。 为了应对这一限制,我们提出了一个称为Dmotion的方法,即通过拆开可控物剂的动作来模拟过渡动态,从视频数据中培养前方模型。一个物体提取器和一个互动学习器在没有监督的情况下以端到端的方式接受培训。代理器的动作明确代表了含有物理含义的空间转换矩阵。在实验中,Dmotion在学习一个准确的前方模型以及模拟中更现实的机器人控制环境上取得了优异的性能。有了准确的前方模型,我们进一步展示了其在模型预测控制中的使用,作为机器人操纵的有效方法。