We collected a new dataset that includes approximately eight hours of audiovisual recordings of a group of students and their self-evaluation scores for classroom engagement. The dataset and data analysis scripts are available on our open-source repository. We developed baseline face-based and group-activity-based image and video recognition models. Our image models yield 45-85% test accuracy with face-area inputs on person-based classification task. Our video models achieved up to 71% test accuracy on group-level prediction using group activity video inputs. In this technical report, we shared the details of our end-to-end human-centered engagement analysis pipeline from data collection to model development.
翻译:我们收集了一组学生的大约八小时的音视频记录以及他们的自我评估分数,用于课堂参与度的预测。数据集和数据分析脚本均可在我们的开源库中获得。我们开发了基础的基于面部和小组活动的图像和视频识别模型。我们的图像模型使用面部区域输入在人员分类任务上获得45-85%的测试精度。我们的视频模型使用小组活动视频输入,在小组层面预测上最高获得71%的测试精度。在这份技术报告中,我们分享了从数据收集到模型开发的针对人类参与分析的端到端分析流程的详细信息。