Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.
翻译:强化学习(RL) Followloresuggests指出,历史功能协调方法,如经常性神经网或历史型状态抽象学等基于历史的功能方法,其效果优于无记忆的对等方法,因为可以将Markov决策程序中的功能近似(MDP)视为诱发部分可见的MDP。然而,对于这种基于历史的算法,几乎没有正式分析,因为大多数现有框架都完全侧重于无记忆特征。在本文中,我们引入了一个理论框架,用于研究RL算法的行为,这些算法学会使用基于历史的特征抽象图来控制MDP。此外,我们利用这个框架设计实用的RL算法,并用数字方法评估其在一套连续控制任务上的有效性。