Automatic affect recognition using visual cues is an important task towards a complete interaction between humans and machines. Applications can be found in tutoring systems and human computer interaction. A critical step towards that direction is facial feature extraction. In this paper, we propose a facial feature extractor model trained on an in-the-wild and massively collected video dataset provided by the RealEyes company. The dataset consists of a million labelled frames and 2,616 thousand subjects. As temporal information is important to the emotion recognition domain, we utilise LSTM cells to capture the temporal dynamics in the data. To show the favourable properties of our pre-trained model on modelling facial affect, we use the RECOLA database, and compare with the current state-of-the-art approach. Our model provides the best results in terms of concordance correlation coefficient.
翻译:使用视觉提示自动影响识别是人类和机器之间完整互动的重要任务。 应用可以在辅导系统和人类计算机互动中找到。 朝这个方向迈出的关键一步是面部特征提取。 在本文中,我们建议使用一个面部特征提取模型, 由RealEyes公司提供全速收集的视频数据集。 该数据集由100万个标签框架和2,616,000个主题组成。 由于时间信息对情感识别领域很重要, 我们使用LSTM细胞来捕捉数据的时间动态。 为了显示我们经过培训的模拟面部影响的模型的有利性, 我们使用RECOLA数据库, 并与目前的最新方法进行比较。 我们的模型在一致性相关系数方面提供了最佳结果 。