Many reinforcement learning (RL) environments consist of independent entities that interact sparsely. In such environments, RL agents have only limited influence over other entities in any particular situation. Our idea in this work is that learning can be efficiently guided by knowing when and what the agent can influence with its actions. To achieve this, we introduce a measure of situation-dependent causal influence based on conditional mutual information and show that it can reliably detect states of influence. We then propose several ways to integrate this measure into RL algorithms to improve exploration and off-policy learning. All modified algorithms show strong increases in data efficiency on robotic manipulation tasks.
翻译:许多强化学习(RL)环境由互动很少的独立实体组成,在这种环境中,RL代理商在任何特定情况下对其他实体的影响有限。我们在这项工作中的想法是,学习可以通过了解该代理商何时和何种行动可以影响来有效指导。为了做到这一点,我们根据有条件的相互信息,引入了一定程度的基于情况的因果影响影响,并表明它能够可靠地检测影响状态。然后,我们提出若干办法,将这一措施纳入RL算法,以改进勘探和政策外学习。所有修改的算法都显示机器人操纵任务的数据效率有了显著提高。