Emotions are reactions that can be expressed through a variety of social signals. For example, anger can be expressed through a scowl, narrowed eyes, a long stare, or many other expressions. This complexity is problematic when attempting to recognize a human's expression in a human-robot interaction: categorical emotion models used in HRI typically use only a few prototypical classes, and do not cover the wide array of expressions in the wild. We propose a data-driven method towards increasing the number of known emotion classes present in human-robot interactions, to 28 classes or more. The method includes the use of automatic segmentation of video streams into short (<10s) videos, and annotation using the large set of widely-understood emojis as categories. In this work, we showcase our initial results using a large in-the-wild HRI dataset (UE-HRI), with 61 clips randomly sampled from the dataset, labeled with 28 different emojis. In particular, our results showed that the "skeptical" emoji was a common expression in our dataset, which is not often considered in typical emotion taxonomies. This is the first step in developing a rich taxonomy of emotional expressions that can be used in the future as labels for training machine learning models, towards more accurate perception of humans by robots.
翻译:情感是可以通过各种社会信号表达的反应。 例如, 愤怒可以通过一个 scowl 、 缩小眼睛、 长视、 或其他许多表达方式表达出来。 当试图识别人类在人类- 机器人互动中的表达方式时, 这种复杂性是成问题的: HRI 中所使用的绝对情感模型通常只使用几个原型类, 并不覆盖野生表达方式的广泛范围。 我们提出一种数据驱动方法, 将人类- 机器人互动中已知的情绪类数量增加到28类或28类以上。 这种方法包括将视频流自动分割成短( < 10 ) 视频, 以及使用大量被广泛理解的模版作为分类的批注。 在这项工作中, 我们展示我们的初步结果通常使用一个大型的在虚拟的HRI 数据集( UE- HRI ), 随机抽样的61个剪辑, 标记为28 种不同的感官。 特别是, 我们的结果显示“ 怀疑” emoji 是我们数据集中的第一个常见的表达方式, 在人类的模型中, 这个典型的感官学学学学学学学学学的模型通常被理解为是用来的。