Education artificial intelligence aims to profit tasks in the education domain such as intelligent test paper generation and consolidation exercises where the main technique behind is how to match the exercises, known as the finding similar exercises(FSE) problem. Most of these approaches emphasized their model abilities to represent the exercise, unfortunately there are still many challenges such as the scarcity of data, insufficient understanding of exercises and high label noises. We release a Chinese education pre-trained language model BERT$_{Edu}$ for the label-scarce dataset and introduce the exercise normalization to overcome the diversity of mathematical formulas and terms in exercise. We discover new auxiliary tasks in an innovative way depends on problem-solving ideas and propose a very effective MoE enhanced multi-task model for FSE task to attain better understanding of exercises. In addition, confidence learning was utilized to prune train-set and overcome high noises in labeling data. Experiments show that these methods proposed in this paper are very effective.
翻译:教育人工智能旨在为教育领域的任务谋利,如智能测试纸制作和整合练习,其背后的主要技术是如何匹配练习,即类似练习(FSE)的发现问题,这些方法大多强调其示范能力来代表练习,不幸的是,仍然存在许多挑战,如数据匮乏、对练习理解不足和标签噪音高等。我们为标签碎片数据集发行了中国教育前语言模型BERT$ ⁇ Edu}(BERT$ edu}$),并引入了练习正常化,以克服数学公式和术语在练习中的多样性。我们发现新的辅助任务,以创新的方式取决于解决问题的想法,并为FSE任务提出了一个非常有效的教育部强化多任务模型,以更好地了解练习。此外,还利用了信心学习,在标签数据中设置和克服高噪音。实验表明,本文件提出的这些方法非常有效。