Options represent a framework for reasoning across multiple time scales in reinforcement learning (RL). With the recent active interest in the unsupervised learning paradigm in the RL research community, the option framework was adapted to utilize the concept of empowerment, which corresponds to the amount of influence the agent has on the environment and its ability to perceive this influence, and which can be optimized without any supervision provided by the environment's reward structure. Many recent papers modify this concept in various ways achieving commendable results. Through these various modifications, however, the initial context of empowerment is often lost. In this work we offer a comparative study of such papers through the lens of the original empowerment principle.
翻译:在强化学习中,选择方案是一个跨越多个时间尺度的推理框架(RL)。由于最近积极关注RL研究界不受监督的学习模式,选择方案框架进行了调整,以利用赋权概念,这一概念与行为人对环境的影响程度及其感知这种影响的能力相对应,可以在没有环境奖励结构提供任何监督的情况下加以优化。许多最近的文件以各种方式修改了这一概念,取得了值得称道的成果。然而,通过这些不同的修改,增强能力的初步背景常常丢失。在这项工作中,我们通过原始赋权原则的透镜对此类文件进行比较研究。