The capability to interactively learn from human feedback would enable agents in new settings. For example, even novice users could train service robots in new tasks naturally and interactively. Human-in-the-loop Reinforcement Learning (HRL) combines human feedback and Reinforcement Learning (RL) techniques. State-of-the-art interactive learning techniques suffer from slow learning speed, thus leading to a frustrating experience for the human. We approach this problem by extending the HRL framework TAMER for evaluative feedback with the possibility to enhance human feedback with two different types of counterfactual explanations (action and state based). We experimentally show that our extensions improve the speed of learning.
翻译:从人类反馈中互动学习的能力将使代理人能够在新的环境中发挥作用。例如,即使是新用户也可以在新的任务中自然地和交互地训练服务机器人。“人与人之间的强化学习”结合了人类的反馈和强化学习技术。最先进的互动学习技术学习速度缓慢,从而给人类带来令人沮丧的经历。我们通过扩大HRL框架TAMER的评价反馈来解决这一问题,并有可能用两种不同的反事实解释(基于行动和状态)加强人类的反馈。我们实验性地表明,我们的扩展提高了学习速度。