This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of the number of people, of their positions and of their physical appearances. In particular, we use a recurrent neural network architecture in combination with Q-learning to find an optimal action-selection policy; we pre-train the network using a simulated environment that mimics realistic scenarios that involve speaking/silent participants, thus avoiding the need of tedious sessions of a robot interacting with people. Our experimental evaluation suggests that the proposed method is robust against parameter estimation, i.e. the parameter values yielded by the method do not have a decisive impact on the performance. The best results are obtained when both audio and visual information is jointly used. Experiments with the Nao robot indicate that our framework is a step forward towards the autonomous learning of socially acceptable gaze behavior.
翻译:本文介绍了一种新型神经网络强化学习方法,用于机器人凝视控制。我们的方法使机器人能够学习并调整其视觉控制战略,既不能使用外部传感器,也不能使用人的监督。机器人学会从自己的视听经验中将注意力集中在人群上,而不受人的数量、其位置和外观的影响。特别是,我们使用一个经常性神经网络结构,结合Q学习,以找到最佳的行动选择政策;我们预先培训网络,使用模拟环境,模拟现实情景,包括演讲/沉默参与者,从而避免机器人与人互动的乏味会议。我们的实验评估表明,拟议方法对于参数估计没有决定性影响,即该方法产生的参数值对性能没有决定性影响。在联合使用视听信息时,取得最佳结果。与奈奥机器人进行的实验表明,我们的框架是朝着自主学习社会可接受的视觉行为迈出了一步。