A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world. A critical challenge to such autonomy is the presence of irreversible states which require external assistance to recover from, such as when a robot arm has pushed an object off of a table. While standard agents require constant monitoring to decide when to intervene, we aim to design proactive agents that can request human intervention only when needed. To this end, we propose an algorithm that efficiently learns to detect and avoid states that are irreversible, and proactively asks for help in case the agent does enter them. On a suite of continuous control environments with unknown irreversible states, we find that our algorithm exhibits better sample- and intervention-efficiency compared to existing methods. Our code is publicly available at https://sites.google.com/view/proactive-interventions
翻译:强化学习的长期目标是设计能够自主互动和在世界上学习的代理物。这种自主性面临的一个关键挑战是存在不可逆转的国家,它们需要外部援助才能从中恢复过来,例如当机器人臂将物体推离桌面时。虽然标准代理物需要不断监测才能决定何时进行干预,但我们的目标是设计能够在必要时才要求人类干预的主动代理物。为此,我们建议一种算法,能够有效地学会探测和避免不可逆转的国家,并在该代理物进入时主动要求帮助。关于一组具有未知不可逆转状态的连续控制环境,我们发现我们的算法显示,与现有方法相比,样本和干预效率更高。我们的代码可在https://sites.gogle.com/view/pactive-Introductions上公开查阅。