Continuously measuring the engagement of users with a robot in a Human-Robot Interaction (HRI) setting paves the way towards in-situ reinforcement learning, improve metrics of interaction quality, and can guide interaction design and behaviour optimisation. However, engagement is often considered very multi-faceted and difficult to capture in a workable and generic computational model that can serve as an overall measure of engagement. Building upon the intuitive ways humans successfully can assess situation for a degree of engagement when they see it, we propose a novel regression model (utilising CNN and LSTM networks) enabling robots to compute a single scalar engagement during interactions with humans from standard video streams, obtained from the point of view of an interacting robot. The model is based on a long-term dataset from an autonomous tour guide robot deployed in a public museum, with continuous annotation of a numeric engagement assessment by three independent coders. We show that this model not only can predict engagement very well in our own application domain but show its successful transfer to an entirely different dataset (with different tasks, environment, camera, robot and people). The trained model and the software is available to the HRI community as a tool to measure engagement in a variety of settings.
翻译:持续测量用户与机器人在人类-机器人互动互动(HRI)中的参与程度,为现场强化学习铺平了道路,改进互动质量的衡量标准,并能够指导互动设计和行为优化;然而,互动往往被视为非常多面且难以在可行和通用的计算模型中捕捉到,可以作为整体参与的衡量尺度。根据人类在视觉上成功评估参与程度的方法,当他们看到参与程度时,我们提议了一个新的回归模型(利用CNN和LSTM网络),使机器人能够在从互动机器人的角度获得的标准视频流与人类互动时,从标准视频流中计算出单一的螺旋接触。该模型基于在公共博物馆部署的自主导游机器人的长期数据集,不断说明由三名独立的编码员对参与情况进行的数字评估。我们显示,这一模型不仅能够预测我们自己的应用领域中的参与程度,而且能够显示它成功地转移到一个完全不同的数据集(任务、环境、相机、机器人和人)。经过培训的模型和软件是用于社区的一种工具。