Interactive reinforcement learning proposes the use of externally-sourced information in order to speed up the learning process. When interacting with a learner agent, humans may provide either evaluative or informative advice. Prior research has focused on the effect of human-sourced advice by including real-time feedback on the interactive reinforcement learning process, specifically aiming to improve the learning speed of the agent, while minimising the time demands on the human. This work focuses on answering which of two approaches, evaluative or informative, is the preferred instructional approach for humans. Moreover, this work presents an experimental setup for a human-trial designed to compare the methods people use to deliver advice in terms of human engagement. The results obtained show that users giving informative advice to the learner agents provide more accurate advice, are willing to assist the learner agent for a longer time, and provide more advice per episode. Additionally, self-evaluation from participants using the informative approach has indicated that the agent's ability to follow the advice is higher, and therefore, they feel their own advice to be of higher accuracy when compared to people providing evaluative advice.
翻译:互动强化学习提议使用外部来源的信息,以加快学习过程。当与学习者打交道时,人类可以提供评价或信息建议。先前的研究侧重于人源建议的效果,包括互动式强化学习过程的实时反馈,具体目的是提高学习者学习速度,同时尽量减少对人的时间要求。这项工作侧重于回答两种方法中的哪一种方法,即评价或信息,是人类首选的教学方法。此外,这项工作为人类审判提供了一个实验性设置,目的是比较人们在人类参与方面提供咨询的方法。获得的结果显示,向学习者提供信息咨询的用户提供更准确的建议,愿意在更长的时间内协助学习者,并每集出更多的建议。此外,使用信息方法的参与者的自我评价表明,与提供评价建议的人相比,代理人遵循建议的能力更高,因此他们认为自己的建议更准确。