Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples. Unfortunately, the distant supervision may induce noisy labels, thus undermining the robustness of the learned models and restricting the practical application. To relieve this problem, recent works adopt self-training teacher-student frameworks to gradually refine the training labels and improve the generalization ability of NER models. However, we argue that the performance of the current self-training frameworks for DS-NER is severely underestimated by their plain designs, including both inadequate student learning and coarse-grained teacher updating. Therefore, in this paper, we make the first attempt to alleviate these issues by proposing: (1) adaptive teacher learning comprised of joint training of two teacher-student networks and considering both consistent and inconsistent predictions between two teachers, thus promoting comprehensive student learning. (2) fine-grained student ensemble that updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise. To verify the effectiveness of our proposed method, we conduct experiments on four DS-NER datasets. The experimental results demonstrate that our method significantly surpasses previous SOTA methods.
翻译:通过自动生成培训样本,远程监督可能会引发噪音标签,从而破坏所学模式的稳健性和限制实际应用。为了缓解这一问题,最近的工作采用了自我培训师-学生框架,以逐步完善培训标签,提高净化模式的普遍化能力。然而,我们争辩说,目前DS-净化模式的自培训框架的绩效被其简单设计严重低估了,包括学生学习不足和粗糙教师更新,因此,在本文件中,我们第一次尝试通过建议:(1) 适应性教师学习,包括联合培训两个教师-学生网络,考虑两个教师之间一致和不一致的预测,从而促进学生的全面学习。(2) 微微微微的师级学生群,以学生相应碎片的时移动平均数来更新教师模型的每个碎片,从而增强对每种模型碎片的一致预测,防止噪音。因此,我们在本文件中,我们首次试图通过提议:(1) 适应性教师学习,包括联合培训两个教师-学生网络,并考虑两个教师之间的一致和不一致的预测,从而促进学生的全面学习。(2) 微小的师群,以学生的相应碎片平均速度来更新每个模型的碎片,从而加强对噪音的预测。