Multiple choice questions (MCQs) are widely used in digital learning systems, as they allow for automating the assessment process. However, due to the increased digital literacy of students and the advent of social media platforms, MCQ tests are widely shared online, and teachers are continuously challenged to create new questions, which is an expensive and time-consuming task. A particularly sensitive aspect of MCQ creation is to devise relevant distractors, i.e., wrong answers that are not easily identifiable as being wrong. This paper studies how a large existing set of manually created answers and distractors for questions over a variety of domains, subjects, and languages can be leveraged to help teachers in creating new MCQs, by the smart reuse of existing distractors. We built several data-driven models based on context-aware question and distractor representations, and compared them with static feature-based models. The proposed models are evaluated with automated metrics and in a realistic user test with teachers. Both automatic and human evaluations indicate that context-aware models consistently outperform a static feature-based approach. For our best-performing context-aware model, on average 3 distractors out of the 10 shown to teachers were rated as high-quality distractors. We create a performance benchmark, and make it public, to enable comparison between different approaches and to introduce a more standardized evaluation of the task. The benchmark contains a test of 298 educational questions covering multiple subjects & languages and a 77k multilingual pool of distractor vocabulary for future research.
翻译:数字学习系统广泛使用多种选择问题(MCQ),因为这些问题可以使评估进程自动化;然而,由于学生数字识字程度提高和社交媒体平台的出现,MCQ测试在网上得到广泛共享,教师在创造新问题方面不断面临挑战,这是一个昂贵和耗时的任务。创建MCQ的一个特别敏感的方面是设计相关的分流器,即不易识别错误的错误答案。本文研究如何利用一大批现有手工创建的对不同领域、主题和语言的问题的分流式回答和转移器,帮助教师通过智能再利用现有分流器来创建新的MCQ。我们根据背景意识问题和分散式陈述建立了若干数据驱动模型,并将这些模型与静态的基于特征的模式进行比较。用自动化的衡量标准和对教师的现实用户测试来评价。 自动和人文评价都表明,背景意识模型始终超越基于固定特征的方法。对于我们最佳的环境意识模型来说,通过平均的3分流式分析器和77种语言来帮助教师创建新的MCQ。我们根据对10个标准化的教师进行高分流和高分流测试。