Videos can evoke a range of affective responses in viewers. The ability to predict evoked affect from a video, before viewers watch the video, can help in content creation and video recommendation. We introduce the Evoked Expressions from Videos (EEV) dataset, a large-scale dataset for studying viewer responses to videos. Each video is annotated at 6 Hz with 15 continuous evoked expression labels, corresponding to the facial expression of viewers who reacted to the video. We use an expression recognition model within our data collection framework to achieve scalability. In total, there are 36.7 million annotations of viewer facial reactions to 23,574 videos (1,700 hours). We use a publicly available video corpus to obtain a diverse set of video content. We establish baseline performance on the EEV dataset using an existing multimodal recurrent model. Transfer learning experiments show an improvement in performance on the LIRIS-ACCEDE video dataset when pre-trained on EEV. We hope that the size and diversity of the EEV dataset will encourage further explorations in video understanding and affective computing. A subset of EEV is released at https://github.com/google-research-datasets/eev.
翻译:在观看者观看视频之前,通过视频进行预测的能力可以帮助内容创建和视频建议。我们引入了视频(EEEV)数据集,这是一个大型数据集,用于研究观看者对视频的反应。每部视频在6赫兹都有附加说明,配有15个连续引用的表达标签,相当于对视频作出反应的观众的面部表达。我们使用我们数据收集框架内的表达识别模型来实现可缩放性。总共有3,670万份显示23,574个视频(1,700小时)的观众面部反应的说明。我们使用公开提供的视频资料以获取一套不同的视频内容。我们使用现有的多式经常性模型确定EEEV数据集的基线性能。传输学习实验显示,在对EEV进行预先培训时,LIRIS-ACCEDED视频数据集的性能有所改善。我们希望EV数据集的规模和多样性将鼓励进一步探索视频理解和影响计算。EV的子子系统在https://gigalevub.com上公布。EV的子。