Detecting mental states of human users is crucial for the development of cooperative and intelligent robots, as it enables the robot to understand the user's intentions and desires. Despite their importance, it is difficult to obtain a large amount of high quality data for training automatic recognition algorithms as the time and effort required to collect and label such data is prohibitively high. In this paper we present a multimodal machine learning approach for detecting dis-/agreement and confusion states in a human-robot interaction environment, using just a small amount of manually annotated data. We collect a data set by conducting a human-robot interaction study and develop a novel preprocessing pipeline for our machine learning approach. By combining semi-supervised and supervised architectures, we are able to achieve an average F1-score of 81.1\% for dis-/agreement detection with a small amount of labeled data and a large unlabeled data set, while simultaneously increasing the robustness of the model compared to the supervised approach.
翻译:检测人类用户的精神状态对于开发合作和智能机器人至关重要,因为它使机器人能够理解用户的意图和愿望。尽管它们很重要,但很难获得大量高质量的数据用于培训自动识别算法,因为收集和标签这类数据所需的时间和努力太高,令人望而却步。在本文中,我们提出了一个多式机器学习方法,用于在人类-机器人互动环境中检测不一致/协议和混乱状态,使用少量人工附加说明的数据。我们通过开展人类-机器人互动研究来收集数据集,并为我们的机器学习方法开发新的预处理管道。通过将半监督和监督的架构相结合,我们得以达到平均81.1-11的F1-S-scre,用于检测不/协议与少量标签数据和大量无标签数据集,同时提高模型相对于监督方法的稳健性。