Deep learning models for learning analytics have become increasingly popular over the last few years; however, these approaches are still not widely adopted in real-world settings, likely due to a lack of trust and transparency. In this paper, we tackle this issue by implementing explainable AI methods for black-box neural networks. This work focuses on the context of online and blended learning and the use case of student success prediction models. We use a pairwise study design, enabling us to investigate controlled differences between pairs of courses. Our analyses cover five course pairs that differ in one educationally relevant aspect and two popular instance-based explainable AI methods (LIME and SHAP). We quantitatively compare the distances between the explanations across courses and methods. We then validate the explanations of LIME and SHAP with 26 semi-structured interviews of university-level educators regarding which features they believe contribute most to student success, which explanations they trust most, and how they could transform these insights into actionable course design decisions. Our results show that quantitatively, explainers significantly disagree with each other about what is important, and qualitatively, experts themselves do not agree on which explanations are most trustworthy. All code, extended results, and the interview protocol are provided at https://github.com/epfl-ml4ed/trusting-explainers.
翻译:过去几年来,深入学习分析的学习模式越来越受欢迎;然而,由于缺乏信任和透明度,这些方法在现实世界环境中仍没有被广泛采用,很可能是因为缺乏信任和透明度。在本文件中,我们通过对黑箱神经网络采用可解释的AI方法来解决这个问题。这项工作侧重于在线和混合学习的背景以及学生成功预测模型的使用案例。我们使用双向研究设计,使我们能够调查两门课程之间的有控制的差别。我们的分析涉及五个不同的课程,在教育相关方面和两种以实例为基础的可解释的AI方法(LIME和SHAP)上存在差异。我们从数量上比较课程和方法之间解释的距离。我们然后用26次半结构式访谈来验证LIME和SHAP的解释。关于他们认为最有助于学生成功的特点的半结构访谈,解释他们最信任这些特点,以及他们如何将这些洞察力转化为可操作的课程设计决定。我们的研究结果表明,从数量上看,解释者在重要之处和定性上都与其他人大不相同,专家自己在哪些解释最可信/FLIB4 访谈中不同意。所有代码/FLSLADLV4的访谈结果。