We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i.e., Block MDPs), where rich observations are generated from a set of unknown latent states. BRIEE interleaves latent states discovery, exploration, and exploitation together, and can provably learn a near-optimal policy with sample complexity scaling polynomially in the number of latent states, actions, and the time horizon, with no dependence on the size of the potentially infinite observation space. Empirically, we show that BRIEE is more sample efficient than the state-of-art Block MDP algorithm HOMER and other empirical RL baselines on challenging rich-observation combination lock problems that require deep exploration.
翻译:我们介绍了BRIEE(与Interleft Explore Explore Exploit(Block-结构化代表性教学 ) ), 这是Markov决策流程中一种高效强化学习的算法,它具有块状结构动态(即块状MDPs ), 在那里,一系列未知的潜在状态产生了丰富的观测结果。 BRIE将潜伏状态的发现、探索和开发联系在一起,并且可以想象地学习一种近乎最佳的政策,其样本复杂性在潜在状态、行动和时间范围的数量上是多元的,不依赖于潜在无限观测空间的大小。 简而言之,我们表明BRIEE比最先进的块状MDP 算法和其他经验性RL基准(挑战需要深入探索的富有观察组合锁问题)的样本效率更高。