To enhance human-robot social interaction, it is essential for robots to process multiple social cues in a complex real-world environment. However, incongruency of input information across modalities is inevitable and could be challenging for robots to process. To tackle this challenge, our study adopted the neurorobotic paradigm of crossmodal conflict resolution to make a robot express human-like social attention. A behavioural experiment was conducted on 37 participants for the human study. We designed a round-table meeting scenario with three animated avatars to improve ecological validity. Each avatar wore a medical mask to obscure the facial cues of the nose, mouth, and jaw. The central avatar shifted its eye gaze while the peripheral avatars generated sound. Gaze direction and sound locations were either spatially congruent or incongruent. We observed that the central avatar's dynamic gaze could trigger crossmodal social attention responses. In particular, human performances are better under the congruent audio-visual condition than the incongruent condition. Our saliency prediction model was trained to detect social cues, predict audio-visual saliency, and attend selectively for the robot study. After mounting the trained model on the iCub, the robot was exposed to laboratory conditions similar to the human experiment. While the human performances were overall superior, our trained model demonstrated that it could replicate attention responses similar to humans.
翻译:为加强人类机器人的社会互动,机器人必须在复杂的现实世界环境中处理多种社会线索。然而,不同模式的输入信息不协调是不可避免的,对机器人来说可能具有挑战性。为了应对这一挑战,我们的研究采用了跨模式冲突解决的神经浪漫范式,以使机器人能够表达人性化的社会关注。对37名参与者进行了行为实验,以进行人类研究。我们设计了一个圆桌会议设想,有3个动画动动动画的动画动画片,以提高生态有效性。每个阿凡达都戴着医疗面具,掩盖鼻子、嘴和下巴的面部提示。中央阿凡达转移眼神视,而周边的阿凡达则发出声音。为了应对这一挑战,我们的研究采用了跨模式,以检测经过培训的人类显性视听状态。在经过培训的机器人实验室里,我们经过培训的人类超常性能模型在检测了人类超常性能测试的机器人性能,在进行这种测试后,我们经过培训的机器人实验室里也进行了类似性能测试。</s>