Automatic detection of students' engagement in online learning settings is a key element to improve the quality of learning and to deliver personalized learning materials to them. Varying levels of engagement exhibited by students in an online classroom is an affective behavior that takes place over space and time. Therefore, we formulate detecting levels of students' engagement from videos as a spatio-temporal classification problem. In this paper, we present a novel end-to-end Residual Network (ResNet) and Temporal Convolutional Network (TCN) hybrid neural network architecture for students' engagement level detection in videos. The 2D ResNet extracts spatial features from consecutive video frames, and the TCN analyzes the temporal changes in video frames to detect the level of engagement. The spatial and temporal arms of the hybrid network are jointly trained on raw video frames of a large publicly available students' engagement detection dataset, DAiSEE. We compared our method with several competing students' engagement detection methods on this dataset. The ResNet+TCN architecture outperforms all other studied methods, improves the state-of-the-art engagement level detection accuracy, and sets a new baseline for future research.
翻译:学生参与在线学习环境的自动检测是提高学习质量和向他们提供个性化学习材料的一个关键要素。学生在在线课堂上表现出的不同程度的参与程度是空间和时间上的一种感动行为。因此,我们从视频中将学生参与程度的检测作为时空分类问题。在本文中,我们提出了一个全新的端对端遗留网络(ResNet)和时空演动网络(TCN)混合神经网络结构,供学生在视频中检测参与水平。2D ResNet提取连续视频框的空间特征,TCN分析视频框的时间变化,以检测参与水平。混合网络的空间和时空部分在大量公开学生参与检测数据集DAiseE的原始视频框架上进行了联合培训。我们将我们的方法与该数据集上的若干相互竞争的学生参与检测方法进行了比较。ResNet+TCN结构超越了所有其他研究方法,改进了州级参与水平检测的准确度,并为未来研究制定了新的基线。