The success of deep reinforcement learning (DRL) is due to the power of learning a representation that is suitable for the underlying exploration and exploitation task. However, existing provable reinforcement learning algorithms with linear function approximation often assume the feature representation is known and fixed. In order to understand how representation learning can improve the efficiency of RL, we study representation learning for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose a provably efficient algorithm called ReLEX that can simultaneously learn the representation and perform exploration. We show that ReLEX always performs no worse than a state-of-the-art algorithm without representation learning, and will be strictly better in terms of sample efficiency if the function class of representations enjoys a certain mild "coverage'' property over the whole state-action space.
翻译:深层强化学习(DRL)之所以成功,是因为学会了适合基本勘探和开发任务的一种代表方式。然而,现有的具有线性功能近似的现有可变强化学习算法往往假定特征代表方式是已知的和固定的。为了了解代表性学习如何能提高RL的效率,我们为一组低级的Markov决策程序(MDPs)学习代表性学习方式,其中过渡核心可以以双线形式表示。我们建议一种称为ReLEX的、可同时学习代表性和进行探索的、可实现的高效算法。我们表明,RELEX的运行方式总是不比不进行代表性学习的、最先进的算法差,而且如果代表的功能类别在整个国家行动空间享有某种温和的“覆盖”财产,那么在抽样效率方面将非常好。