We propose a new approach to inverse reinforcement learning (IRL) based on the deep Gaussian process (deep GP) model, which is capable of learning complicated reward structures with few demonstrations. Our model stacks multiple latent GP layers to learn abstract representations of the state feature space, which is linked to the demonstrations through the Maximum Entropy learning framework. Incorporating the IRL engine into the nonlinear latent structure renders existing deep GP inference approaches intractable. To tackle this, we develop a non-standard variational approximation framework which extends previous inference schemes. This allows for approximate Bayesian treatment of the feature space and guards against overfitting. Carrying out representation and inverse reinforcement learning simultaneously within our model outperforms state-of-the-art approaches, as we demonstrate with experiments on standard benchmarks ("object world","highway driving") and a new benchmark ("binary world").
翻译:我们基于深高西亚进程(深度GP)模型提出了反向强化学习(IRL)的新方法,该模型能够以少量演示学习复杂的奖赏结构。我们的模型堆叠着多个潜伏的GP层,以学习国家特征空间的抽象表达,这与通过最大倍增学习框架进行的演示相关联。将IRL引擎纳入非线性潜伏结构,使得现有的深度GP推断方法难以解决。为了解决这个问题,我们开发了一个非标准的变异近似框架,扩展了以前的推论计划。这样可以让贝叶斯人大致地处理地物空间,保护人避免过于适应。在模型优于最先进的方法中同时进行代表性和反强化学习,正如我们在标准基准(“倾角世界”,“高速驱动”)和新的基准(“二元世界”)实验所证明的那样。