Deep Reinforcement Learning (RL) is unquestionably a robust framework to train autonomous agents in a wide variety of disciplines. However, traditional deep and shallow model-free RL algorithms suffer from low sample efficiency and inadequate generalization for sparse state spaces. The options framework with temporal abstractions is perhaps the most promising method to solve these problems, but it still has noticeable shortcomings. It only guarantees local convergence, and it is challenging to automate initiation and termination conditions, which in practice are commonly hand-crafted. Our proposal, the Deep Variational Q-Network (DVQN), combines deep generative- and reinforcement learning. The algorithm finds good policies from a Gaussian distributed latent-space, which is especially useful for defining options. The DVQN algorithm uses MSE with KL-divergence as regularization, combined with traditional Q-Learning updates. The algorithm learns a latent-space that represents good policies with state clusters for options. We show that the DVQN algorithm is a promising approach for identifying initiation and termination conditions for option-based reinforcement learning. Experiments show that the DVQN algorithm, with automatic initiation and termination, has comparable performance to Rainbow and can maintain stability when trained for extended periods after convergence.
翻译:深度强化学习(RL)无疑是一个强有力的框架,用于在广泛的学科中培训自主代理机构。然而,传统的深浅不使用模型的RL算法,其样本效率低,且对稀有的州空间的概括性不足。具有时间抽象的选项框架也许是解决这些问题的最有希望的方法,但它仍然有明显的缺点。它只能保证地方趋同,而且自动化启动和终止条件也是挑战,实际上,这种条件通常都是手工制作的。我们的提案,深变式QNetwork(DVQN),将深基因化和增强性学习结合起来。算法发现高山分布的潜空空间有良好的政策,对于界定选项特别有用。DVQN算法使用带有KL-divergence的 MSE作为正规化方法,与传统的QL-Learn更新相结合。算法学习了一种潜伏空间,它代表了与国家选项组合的良好政策。我们显示,DVQN算法是确定基于选项的强化学习的启动和终止条件的一个很有希望的方法。实验表明,DVQN算法在经过培训后,在经过自动启动和升级后,可以保持稳定,在进行升级后,可比较后可以保持。