Self-augmentation has received increasing research interest recently to improve named entity recognition (NER) performance in low-resource scenarios. Token substitution and mixup are two feasible heterogeneous self-augmentation techniques for NER that can achieve effective performance with certain specialized efforts. Noticeably, self-augmentation may introduce potentially noisy augmented data. Prior research has mainly resorted to heuristic rule-based constraints to reduce the noise for specific self-augmentation methods individually. In this paper, we revisit these two typical self-augmentation methods for NER, and propose a unified meta-reweighting strategy for them to achieve a natural integration. Our method is easily extensible, imposing little effort on a specific self-augmentation method. Experiments on different Chinese and English NER benchmarks show that our token substitution and mixup method, as well as their integration, can achieve effective performance improvement. Based on the meta-reweighting mechanism, we can enhance the advantages of the self-augmentation techniques without much extra effort.
翻译:自我增强最近引起了越来越多的研究兴趣,以在低资源情景下提高命名实体的识别性(NER)性能。当量替代和混杂是两种可行的新能源的多元自我增强技术,可以在某些专门的努力下实现有效运行。很显然,自我增强可能引入潜在的噪音增强的数据。先前的研究主要采用基于超自然规律的制约,以降低特定自我增强方法个别的噪音。在本文中,我们重新审视这两种典型的自增强方法,并为它们提出实现自然整合的统一的元加权战略。我们的方法很容易推广,对具体的自我增强方法几乎不作任何努力。关于中国和英国新能源基准的不同实验表明,我们象征性替代和混合方法及其整合可以实现有效的绩效改进。基于元称重机制,我们可以在不做任何额外努力的情况下增强自增强技术的优势。