Deep Reinforcement Learning has shown its ability in solving complicated problems directly from high-dimensional observations. However, in end-to-end settings, Reinforcement Learning algorithms are not sample-efficient and requires long training times and quantities of data. In this work, we proposed a framework for sample-efficient Reinforcement Learning that take advantage of state and action representations to transform a high-dimensional problem into a low-dimensional one. Moreover, we seek to find the optimal policy mapping latent states to latent actions. Because now the policy is learned on abstract representations, we enforce, using auxiliary loss functions, the lifting of such policy to the original problem domain. Results show that the novel framework can efficiently learn low-dimensional and interpretable state and action representations and the optimal latent policy.
翻译:深强化学习表明它有能力直接解决从高层次观测产生的复杂问题,然而,在端到端的环境下,强化学习算法不具有抽样效率,需要较长的培训时间和数量数据。在这项工作中,我们提议了一个样本高效强化学习框架,利用州和行动表述,将高层次问题转化为低层次问题。此外,我们寻求找到最佳的政策,将潜在国家定位为潜伏行动。由于现在该政策已经学会了抽象表述,我们运用辅助损失功能,将这种政策提升到原始问题领域。结果显示,新颖的框架可以有效地学习低度和可解释的状态和行动表述以及最佳潜伏政策。