Human-robot interaction (HRI) benefits greatly from advances in the machine learning field as it allows researchers to employ high-performance models for perceptual tasks like detection and recognition. Especially deep learning models, either pre-trained for feature extraction or used for classification, are now established methods to characterize human behaviors in HRI scenarios and to have social robots that understand better those behaviors. As HRI experiments are usually small-scale and constrained to particular lab environments, the questions are how well can deep learning models generalize to specific interaction scenarios, and further, how good is their robustness towards environmental changes? These questions are important to address if the HRI field wishes to put social robotic companions into real environments acting consistently, i.e. changing lighting conditions or moving people should still produce the same recognition results. In this paper, we study the impact of different image conditions on the recognition of arousal and valence from human facial expressions using the FaceChannel framework \cite{Barro20}. Our results show how the interpretation of human affective states can differ greatly in either the positive or negative direction even when changing only slightly the image properties. We conclude the paper with important points to consider when employing deep learning models to ensure sound interpretation of HRI experiments.
翻译:人类机器人互动(HRI)从机器学习领域的进步中受益匪浅,因为它使研究人员能够运用高性能模型进行探测和识别等感知任务。 特别是深层次学习模型,无论是为地貌提取或用于分类而预先培训,现在都是在HRI情景中描述人类行为和让社会机器人更好地了解这些行为的既定方法。 由于HRI实验通常规模小,且受特定实验室环境的限制,因此问题是深层学习模型能如何概括具体互动情景,以及进一步说明其对环境变化的稳健性如何?如果HRI字段希望将社会机器人伴侣置于真实环境中,即改变照明条件或移动人员仍应产生相同的识别结果,这些问题非常重要。在本文中,我们研究不同图像条件对承认人类面部表情的振奋和价值的影响,使用面部框架\cite{Barro20}。我们的结果表明,对于人类影响状态的解释如何在正向或负向两方面都有很大差异?即使只是稍微改变图像属性时,我们还是要解决这些问题。我们通过使用深层次的模型来研究,我们得出重要的观点,以便考虑正确理解。