We introduce DAiSEE, the largest multi-label video classification dataset comprising of over two-and-a-half million video frames (2,723,882), 9068 video snippets (about 25 hours of recording) captured from 112 users for recognizing user affective states, including engagement, in the wild. In addition to engagement, it also includes associated affective states of boredom, confusion, and frustration, which are relevant to such applications. The dataset has four levels of labels from very low to very high for each of the affective states, collected using crowd annotators and correlated with a gold standard annotation obtained from a team of expert psychologists. We have also included benchmark results on this dataset using state-of-the-art video classification methods that are available today, and the baselines on each of the labels is included with this dataset. To the best of our knowledge, DAiSEE is the first and largest such dataset in this domain. We believe that DAiSEE will provide the research community with challenges in feature extraction, context-based inference, and development of suitable machine learning methods for related tasks, thus providing a springboard for further research.
翻译:我们引入了DAISEE, 这是最大的多标签视频分类数据集, 由112个用户拍摄的超过两百五十万个视频框架(2, 723, 882) 9068个视频片段( 记录约25小时), 以识别野生用户的感官状态, 包括参与。 除了参与外, 它还包括与这些应用相关的无聊、 混乱和沮丧等相关连带的感官状态。 该数据集有四个等级的标签, 每个受影响的州从非常低到非常高的标签, 使用人群批注器收集, 并与一组专家心理学家提供的黄金标准批注相关。 我们还包括了该数据集的基准结果, 使用今天可用的最先进的视频分类方法, 并且每个标签的基线都包含在这个数据集中。 据我们所知, DAISEE是该领域第一个和最大的这类数据集。 我们相信 DAISEE将为研究界提供特征提取、 环境推导以及开发相关任务的适当机器学习方法方面的挑战, 从而为进一步的研究提供跳板。