In this paper we present a novel method for learning hierarchical representations of Markov decision processes. Our method works by partitioning the state space into subsets, and defines subtasks for performing transitions between the partitions. We formulate the problem of partitioning the state space as an optimization problem that can be solved using gradient descent given a set of sampled trajectories, making our method suitable for high-dimensional problems with large state spaces. We empirically validate the method, by showing that it can successfully learn a useful hierarchical representation in a navigation domain. Once learned, the hierarchical representation can be used to solve different tasks in the given domain, thus generalizing knowledge across tasks.
翻译:在本文中,我们提出了一个学习马尔科夫决策过程的等级代表的新方法。 我们的方法是将国家空间分成子集, 并定义进行分区之间过渡的子任务。 我们将国家空间分割问题描述为一个优化问题, 通过一组抽样轨迹来使用梯度梯度下降可以解决这个问题, 这使得我们的方法适合大型国家空间的高维问题。 我们通过实验验证方法, 表明它可以成功地在导航领域学习有用的等级代表。 一旦了解, 等级代表可以用来解决特定领域的不同任务, 从而将知识普及到不同任务中 。