The field of social robotics will likely need to depart from a paradigm of designed behaviours and imitation learning and adopt modern reinforcement learning (RL) methods to enable robots to interact fluidly and efficaciously with humans. In this paper, we present the Social Reward Function as a mechanism to provide (1) a real-time, dense reward function necessary for the deployment of RL agents in social robotics, and (2) a standardised objective metric for comparing the efficacy of different social robots. The Social Reward Function is designed to closely mimic those genetically endowed social perception capabilities of humans in an effort to provide a simple, stable and culture-agnostic reward function. Presently, datasets used in social robotics are either small or significantly out-of-domain with respect to social robotics. The use of the Social Reward Function will allow larger in-domain datasets to be collected close to the behaviour policy of social robots, which will allow both further improvements to reward functions and to the behaviour policies of social robots. We believe this will be the key enabler to developing efficacious social robots in the future.
翻译:社会机器人领域可能需要脱离设计行为和模仿学习的范式,采用现代强化学习方法,使机器人能够与人类进行流畅和有效的互动。在本文中,我们提出社会奖励功能,作为提供:(1) 在社会机器人中部署RL代理所需的实时、密集的奖励功能,(2) 用于比较不同社会机器人的功效的标准客观衡量标准。社会奖励功能旨在密切模仿人类具有遗传特性的社会认知能力,以努力提供一个简单、稳定和文化上不可接受的奖励功能。目前,社会机器人中使用的数据群不是很小,就是在社会机器人方面大为外表。使用社会奖励功能将允许在接近社会机器人行为政策的地方收集更多的部内数据集,从而既可以进一步改进奖励功能,也可以改进社会机器人的行为政策。我们认为,这将成为未来开发可靠社会机器人的关键推动者。