Data augmentation techniques have been used to improve the generalization capability of models in the named entity recognition (NER) tasks. Existing augmentation methods either manipulate the words in the original text that require hand-crafted in-domain knowledge, or leverage generative models which solicit dependency order among entities. To alleviate the excessive reliance on the dependency order among entities in existing augmentation paradigms, we develop an entity-to-text instead of text-to-entity based data augmentation method named: EnTDA to decouple the dependencies between entities by adding, deleting, replacing and swapping entities, and adopt these augmented data to bootstrap the generalization ability of the NER model. Furthermore, we introduce a diversity beam search to increase the diversity of the augmented data. Experiments on thirteen NER datasets across three tasks (flat NER, nested NER, and discontinuous NER) and two settings (full data NER and low resource NER) show that EnTDA could consistently outperform the baselines.
翻译:现有增强型方法要么操纵原始文本中需要手工制作的内部知识的词句,要么利用在各实体之间产生依赖性的基因模型; 为了减轻现有增强型模式中各实体对依赖性秩序的过度依赖,我们开发了一个实体对文本而不是基于文本到实体的数据增强方法,名称是: EnTDA通过增加、删除、替换和交换实体来区分各实体之间的依赖性,并采用这些增强型数据来束缚净化型的普及能力。此外,我们引入了多样性波束搜索,以增加扩大型数据的多样性。在三个任务(充气NER、嵌式NER和不连续的NER)和两个环境(全数据净值和低资源NER)上对13个净值数据集的实验表明,ETDA可以始终超越基线。