Learning or identifying dynamics from a sequence of high-dimensional observations is a difficult challenge in many domains, including reinforcement learning and control. The problem has recently been studied from a generative perspective through latent dynamics: high-dimensional observations are embedded into a lower-dimensional space in which the dynamics can be learned. Despite some successes, latent dynamics models have not yet been applied to real-world robotic systems where learned representations must be robust to a variety of perceptual confounds and noise sources not seen during training. In this paper, we present a method to jointly learn a latent state representation and the associated dynamics that is amenable for long-term planning and closed-loop control under perceptually difficult conditions. As our main contribution, we describe how our representation is able to capture a notion of heteroscedastic or input-specific uncertainty at test time by detecting novel or out-of-distribution (OOD) inputs. We present results from prediction and control experiments on two image-based tasks: a simulated pendulum balancing task and a real-world robotic manipulator reaching task. We demonstrate that our model produces significantly more accurate predictions and exhibits improved control performance, compared to a model that assumes homoscedastic uncertainty only, in the presence of varying degrees of input degradation.
翻译:在许多领域,包括强化学习和控制,从一系列高层次观测中学习或确定动态是一个困难的挑战。这个问题最近已经通过潜伏动态从基因角度进行了研究:高层次观测被嵌入一个可以了解动态的低维空间。尽管取得了一些成功,但潜伏动态模型尚未应用于现实世界机器人系统,在这种系统中,在培训期间没有看到的各种感知性解析和噪音源中,必须能够进行学习性表现和学习。在本文中,我们提出了一个方法,以共同学习一种潜在的国家代表性和相关动态,这些动态在概念上困难的条件下可以进行长期规划和闭路控制。作为我们的主要贡献,我们描述了我们的代表性如何在测试时通过检测新颖的或分流(OOOD)投入来捕捉到一个超强的或特定投入的不确定性概念。我们介绍了两种基于图像的任务的预测和控制实验的结果:模拟的笔平衡任务和现实世界机器人操纵任务。我们证明,我们的模型在认知性预测和显示控制性能的改善程度,与模型相比,我们只能以同一模式假设的退化程度。