Developing agents that can perform complex control tasks from high dimensional observations such as pixels is challenging due to difficulties in learning dynamics efficiently. In this work, we propose to learn forward and inverse dynamics in a fully unsupervised manner via contrastive estimation. Specifically, we train a forward dynamics model and an inverse dynamics model in the feature space of states and actions with data collected from random exploration. Unlike most existing deterministic models, our energy-based model takes into account the stochastic nature of agent-environment interactions. We demonstrate the efficacy of our approach across a variety of tasks including goal-directed planning and imitation from observations. Project videos and code are at https://jianrenw.github.io/cloud/.
翻译:在这项工作中,我们提议以完全不受监督的方式,通过对比性估计,以完全不受监督的方式,学习前方和反向动态。具体地说,我们用随机探索收集的数据,在各州和行动的特点空间中培训前方动态模型和反向动态模型。与大多数现有的确定性模型不同,我们的能源模型考虑到代理人与环境相互作用的随机性质。我们展示了我们各种任务的方法的有效性,包括目标导向的规划和从观测中模仿。项目视频和代码见https://jianrenw.github.io/cloud/。