We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. Video results are at https://shikharbahl.github.io/hierarchical-ndps/
翻译:我们从高维图像输入中学习了现实世界中动态任务一般化为无形配置的问题。 非线性动态系统型方法的组合成功地展示了动态机器人行为,但很难将之推广为无形配置,也难以从图像输入中学习。最近的工作通过使用深网络政策和重新调整行动来解决这一问题,以嵌入动态系统结构,但仍在图像目标的多样化配置中挣扎,因此发现难以概括。在本文中,我们通过利用将动态系统结构嵌入等级深层次政策学习框架中的动态系统结构来解决这一问题。称为高级神经动态政策(H-NDPs)的组合成功地展示了动态机器人行为,但与其直接将深度动态系统推广为无形配置,并学习各种数据。H-NDPs形成一个课程,方法是在州空间小区域学习基于动态系统的地方政策,然后将之推向基于全球动态系统的政策,该政策只能从高维度图像中运作。H-NDPsil进一步提供平稳的轨迹,在现实世界中带来强大的安全效益。我们进行广泛的动态视频动态动态动态动态模拟实验,在真实的模拟中进行这种实验,我们正在以展示。