Providing Reinforcement Learning (RL) agents with human feedback can dramatically improve various aspects of learning. However, previous methods require human observer to give inputs explicitly (e.g., press buttons, voice interface), burdening the human in the loop of RL agent's learning process. Further, it is sometimes difficult or impossible to obtain the explicit human advise (feedback), e.g., autonomous driving, disabled rehabilitation, etc. In this work, we investigate capturing human's intrinsic reactions as implicit (and natural) feedback through EEG in the form of error-related potentials (ErrP), providing a natural and direct way for humans to improve the RL agent learning. As such, the human intelligence can be integrated via implicit feedback with RL algorithms to accelerate the learning of RL agent. We develop three reasonably complex 2D discrete navigational games to experimentally evaluate the overall performance of the proposed work. Major contributions of our work are as follows, (i) we propose and experimentally validate the zero-shot learning of ErrPs, where the ErrPs can be learned for one game, and transferred to other unseen games, (ii) we propose a novel RL framework for integrating implicit human feedbacks via ErrPs with RL agent, improving the label efficiency and robustness to human mistakes, and (iii) compared to prior works, we scale the application of ErrPs to reasonably complex environments, and demonstrate the significance of our approach for accelerated learning through real user experiments.
翻译:提供强化学习(RL)的代理机构可以大大改善学习的各个方面。然而,以往的方法要求人类观察者在RL代理机构的学习过程中明确(例如按键、语音界面)提供投入(例如按键、语音界面等),使人类在RL代理机构的学习过程中承受重担。此外,有时很难或不可能获得明确的人为建议(反馈),例如自主驾驶、残疾康复等。在这项工作中,我们调查通过EEEEEG获取人类内在反应的隐含(和自然)反馈,其形式为与错误有关的潜力(ErrP),为人类提供自然和直接的途径来改进RL代理的学习。因此,人类情报可以通过RL算法的隐含反馈进行整合,以加速RL的学习。我们开发了三种相当复杂的2D离散导航游戏,以实验性地评估拟议工作的总体绩效。我们的工作的主要贡献如下:(一)我们提议并实验性地验证ErrPs的零射学方法,其形式为人类提供一种复杂的游戏的学习意义,并将ErrP的精度转移到其他隐性游戏的精度框架,我们建议通过以前的精度的精度的精度,将人类的精度与精度转化为的精度转化为的精度转换的精度转化为的精度转化为的精度转化为的精度与精度转化为的精度转化为的精度转化为的精度转化为的精度转化为的精度转化为的精度转化为的精度,二)。