This paper shows that a popular approach to the supervised embedding of documents for classification, namely, contrastive Word Mover's Embedding, can be significantly enhanced by adding interpretability. This interpretability is achieved by incorporating a clustering promoting mechanism into the contrastive loss. On several public datasets, we show that our method improves significantly upon existing baselines while providing interpretation to the clusters via identifying a set of keywords that are the most representative of a particular class. Our approach was motivated in part by the need to develop Natural Language Processing (NLP) methods for the \textit{novel problem of assessing student work for scientific writing and thinking} - a problem that is central to the area of (educational) Learning Sciences (LS). In this context, we show that our approach leads to a meaningful assessment of the student work related to lab reports from a biology class and can help LS researchers gain insights into student understanding and assess evidence of scientific thought processes.
翻译:本文表明,通过增加可解释性,可以大大加强在监督下嵌入分类文件的流行方法,即对比的Word Moler的嵌入方式。通过将集集促进机制纳入对比性损失,可以实现这种可解释性。在几个公共数据集中,我们显示,我们的方法在现有基线上有很大改进,同时通过确定一组最能代表特定类别的关键词向分类组提供解释。我们的方法的动机部分是由于需要为评估学生科学写作和思考工作而开发自然语言处理方法。 这个问题对(教育)学习科学(LS)领域至关重要。 在这方面,我们表明,我们的方法导致对生物类实验室报告的学生工作进行有意义的评估,有助于LS研究人员深入了解学生对科学思考过程的理解并评估证据。