Existing approaches for automated essay scoring and document representation learning typically rely on discourse parsers to incorporate discourse structure into text representation. However, the performance of parsers is not always adequate, especially when they are used on noisy texts, such as student essays. In this paper, we propose an unsupervised pre-training approach to capture discourse structure of essays in terms of coherence and cohesion that does not require any discourse parser or annotation. We introduce several types of token, sentence and paragraph-level corruption techniques for our proposed pre-training approach and augment masked language modeling pre-training with our pre-training method to leverage both contextualized and discourse information. Our proposed unsupervised approach achieves new state-of-the-art result on essay Organization scoring task.
翻译:现有自动作文评分和文件代表学习方法通常依靠讲义分析者将话语结构纳入文字代表中,然而,讲义分析者的表现并不总是足够的,特别是当他们被用于学生论文等吵闹的文稿时。在本文中,我们建议采用一种不受监督的培训前方法,从一致性和凝聚力的角度来捕捉讲文的讲义结构,而不需要任何讲义分析或批注。我们为拟议的培训前方法采用几种象征性、句子和段落级的腐败技术,用我们的培训前方法加强蒙面语言的训练前模式,以利用背景化和讲义信息。我们提议的未经监督的方法在作文组织评分任务上实现了新的最新结果。