Our work aims at developing reinforcement learning algorithms that do not rely on the Markov assumption. We consider the class of Non-Markov Decision Processes where histories can be abstracted into a finite set of states while preserving the dynamics. We call it a Markov abstraction since it induces a Markov Decision Process over a set of states that encode the non-Markov dynamics. This phenomenon underlies the recently introduced Regular Decision Processes (as well as POMDPs where only a finite number of belief states is reachable). In all such kinds of decision process, an agent that uses a Markov abstraction can rely on the Markov property to achieve optimal behaviour. We show that Markov abstractions can be learned during reinforcement learning. Our approach combines automata learning and classic reinforcement learning. For these two tasks, standard algorithms can be employed. We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees, and we also provide an experimental evaluation.
翻译:我们的工作旨在开发不依赖Markov假设的强化学习算法。 我们考虑的是非Markov决定过程的类别, 其历史可以在保留动态的同时被抽象化成一组有限的状态。 我们称之为Markov抽象化, 因为它在一组编译非Markov动态的状态中诱发了Markov决定过程。 这一现象是最近引入的常规决定过程( 以及仅能达到有限数目的信仰状态的POMDPs)的基础。 在所有这些类型的决策过程中, 使用Markov抽象化的代理人可以依靠Markov属性来达到最佳行为。 我们显示, Markov抽象化在强化学习中可以学习。 我们的方法将自动磁学和经典强化学习结合起来。 对于这两个任务, 标准算法可以使用。 我们显示, 当使用的算法有PAC保证时, 我们的方法有PAC保证, 我们也提供实验性评估。