Background: Student engagement (SE) in virtual learning can have a major impact on meeting learning objectives and program dropout risks. Developing Artificial Intelligence (AI) models for automatic SE measurement requires annotated datasets. However, existing SE datasets suffer from inconsistent definitions and annotation protocols mostly unaligned with the definition of SE in educational psychology. This issue could be misleading in developing generalizable AI models and make it hard to compare the performance of these models developed on different datasets. The objective of this critical review was to explore the existing SE datasets and highlight inconsistencies in terms of differing engagement definitions and annotation protocols. Methods: Several academic databases were searched for publications introducing new SE datasets. The datasets containing students' single- or multi-modal data in online or offline computer-based virtual learning sessions were included. The definition and annotation of SE in the existing datasets were analyzed based on our defined seven dimensions of engagement annotation: sources, data modalities, timing, temporal resolution, level of abstraction, combination, and quantification. Results: Thirty SE measurement datasets met the inclusion criteria. The reviewed SE datasets used very diverse and inconsistent definitions and annotation protocols. Unexpectedly, very few of the reviewed datasets used existing psychometrically validated scales in their definition of SE. Discussion: The inconsistent definition and annotation of SE are problematic for research on developing comparable AI models for automatic SE measurement. Some of the existing SE definitions and protocols in settings other than virtual learning that have the potential to be used in virtual learning are introduced.
翻译:虚拟学习中的学生参与(SE)可对实现学习目标和方案辍学风险产生重大影响。为自动SE测量开发人工智能(AI)模型需要附加说明的数据集。然而,现有的SE数据集的定义和批注协议大多与教育心理学中SE的定义不一致,而且与教育心理学中SE的定义大不相符。这个问题在开发通用AI模型时可能会产生误导,难以比较在不同数据集中开发的这些模型的性能。本次重要审查的目的是探索现有的SE数据集,突出不同参与定义和批注协议的不一致之处。方法:为推出新的SEE数据集搜索了若干学术数据库,以发布新的SE数据集。在线或离线计算机虚拟学习课中包含学生单一或多模式数据的数据集和批注协议存在不一致之处。根据我们定义的7个参与说明内容分析了SEE的定义和批注:资料来源、数据模式、时间、时间分辨率、抽象程度、组合和量化。结果:SEE的计量数据集符合纳入标准。经审查的SEEE的单一或多版本定义使用了不一致性定义。在SEErial定义中采用的一个可比较性定义。