Automatic tagging of knowledge points for practice problems is the basis for managing question bases and improving the automation and intelligence of education. Therefore, it is of great practical significance to study the automatic tagging technology for practice problems. However, there are few studies on the automatic tagging of knowledge points for math problems. Math texts have more complex structures and semantics compared with general texts because they contain unique elements such as symbols and formulas. Therefore, it is difficult to meet the accuracy requirement of knowledge point prediction by directly applying the text classification techniques in general domains. In this paper, K12 math problems taken as the research object, the LABS model based on label-semantic attention and multi-label smoothing combining textual features is proposed to improve the automatic tagging of knowledge points for math problems. The model combines the text classification techniques in general domains and the unique features of math texts. The results show that the models using label-semantic attention or multi-label smoothing perform better on precision, recall, and F1-score metrics than the traditional BiLSTM model, while the LABS model using both performs best. It can be seen that label information can guide the neural networks to extract meaningful information from the problem text, which improves the text classification performance of the model. Moreover, multi-label smoothing combining textual features can fully explore the relationship between text and labels, improve the model's prediction ability for new data and improve the model's classification accuracy.
翻译:对实践问题的知识点进行自动标记是管理问题基点和改进教育自动化和智能的基础,因此,研究自动标记技术解决实践问题的自动标记技术具有重大的实际意义;然而,关于自动标记数学问题知识点的研究很少;数学文本与一般文本相比,具有更为复杂的结构和语义,因为它们含有符号和公式等独特要素。因此,很难通过直接在一般领域应用文本分类技术来达到知识点预测的准确性要求。在本文中,K12数学问题被作为研究对象,基于标签-语义关注和多标签-平稳合并文本功能的LABS模型被用来改进数学问题知识点的自动标记;该模型将一般领域的文本分类技术与数学文本的独特特征结合起来。结果显示,使用标签-语义关注或多标签的顺利化模型比传统的BILSTM模型要好,而LABS模型则用来进行最佳的文字处理和多标签的多标签模式。可以看到,标签的顺利性关系模型可以用来改进文本的精确性判读能力,并且可以用来将数据库的模型和纸质关系加以综合。