In large-scale problems, standard reinforcement learning algorithms suffer from slow learning speed. In this paper, we follow the framework of using subspaces to tackle this problem. We propose a free-energy minimization framework for selecting the subspaces and integrate the policy of the state-space into the subspaces. Our proposed free-energy minimization framework rests upon Thompson sampling policy and behavioral policy of subspaces and the state-space. It is therefore applicable to a variety of tasks, discrete or continuous state space, model-free and model-based tasks. Through a set of experiments, we show that this general framework highly improves the learning speed. We also provide a convergence proof.
翻译:在大规模问题中,标准的强化学习算法受到学习速度缓慢的影响。在本文中,我们遵循使用子空间的框架来解决这一问题。我们建议一个自由能源最小化框架来选择子空间,并将国家空间政策纳入子空间。我们提议的自由能源最小化框架基于汤普森抽样政策和子空间和国家空间的行为政策。因此,它适用于各种任务、离散或连续的国家空间、无模型和基于模型的任务。通过一系列实验,我们证明这个总框架极大地提高了学习速度。我们还提供了一种趋同证据。