Many control tasks exhibit similar dynamics that can be modeled as having common latent structure. Hidden-Parameter Markov Decision Processes (HiP-MDPs) explicitly model this structure to improve sample efficiency in multi-task settings. However, this setting makes strong assumptions on the observability of the state that limit its application in real-world scenarios with rich observation spaces. In this work, we leverage ideas of common structure from the HiP-MDP setting, and extend it to enable robust state abstractions inspired by Block MDPs. We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings. Further, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work that use the same environment assumptions. To further demonstrate the efficacy of the proposed method, we empirically compare and show improvement over multi-task and meta-reinforcement learning baselines.
翻译:许多控制任务都表现出类似的动态,可以模拟其具有共同潜伏结构。隐藏的Parameter Markov 决策程序(HIP-MDPs)明确模拟这一结构,以提高多任务环境中的样本效率。然而,这一设置对限制其在具有丰富观测空间的现实世界情景中应用的国家的可观察性做出了强烈的假设。在这项工作中,我们利用HIP-MDP设置中的共同结构概念,将其扩展,以促成由Block MDPs所启发的强力国家抽取。我们从多任务强化学习(MTRL)和元强化学习(Meta-RL)两个环境的这个新框架中得到即时反应。此外,我们根据任务和相似性提供了转移和一般化的界限,以及取决于跨任务的样本总数而不是任务数目的抽样复杂性,大大改进了以前使用相同环境假设的工作。为了进一步证明拟议方法的效力,我们从经验上比较并显示在多任务和元力强化学习基线方面的改进。