Many reinforcement learning (RL) environments consist of independent entities that interact sparsely. In such environments, RL agents have only limited influence over other entities in any particular situation. Our idea in this work is that learning can be efficiently guided by knowing when and what the agent can influence with its actions. To achieve this, we introduce a measure of \emph{situation-dependent causal influence} based on conditional mutual information and show that it can reliably detect states of influence. We then propose several ways to integrate this measure into RL algorithms to improve exploration and off-policy learning. All modified algorithms show strong increases in data efficiency on robotic manipulation tasks.
翻译:许多强化学习环境(RL)环境由独立实体组成,这些实体互动很少。在这种环境中,RL代理商对其他实体在任何特定情况下的影响有限。我们在此工作中的想法是,学习可以通过了解该代理商何时和何种行动可以影响来有效指导。为此,我们根据有条件的相互信息,引入了一定程度的“情况依赖因果影响 ”, 并表明它能够可靠地检测影响状态。然后,我们提出若干方法,将这一措施纳入RL算法,以改进勘探和政策外学习。所有修改的算法都显示机器人操纵任务的数据效率有显著提高。