Understanding the intention of the users and recognizing the semantic entities from their sentences, aka natural language understanding (NLU), is the upstream task of many natural language processing tasks. One of the main challenges is to collect a sufficient amount of annotated data to train a model. Existing research about text augmentation does not abundantly consider entity and thus performs badly for NLU tasks. To solve this problem, we propose a novel NLP data augmentation technique, Entity Aware Data Augmentation (EADA), which applies a tree structure, Entity Aware Syntax Tree (EAST), to represent sentences combined with attention on the entity. Our EADA technique automatically constructs an EAST from a small amount of annotated data, and then generates a large number of training instances for intent detection and slot filling. Experimental results on four datasets showed that the proposed technique significantly outperforms the existing data augmentation methods in terms of both accuracy and generalization ability.
翻译:理解用户的意图并承认语义实体,自然语言理解(NLU)是许多自然语言处理任务的上游任务,主要挑战之一是收集足够数量的附加说明的数据,以培训模型。现有关于文本增强的研究没有充分考虑到实体,因此对非语言实体的任务执行不力。为了解决这个问题,我们提议采用新的NLP数据增强技术,即“实体了解数据增强技术”,即“实体认识数据增强技术”,以代表与实体关注相结合的句子。我们的EADA技术从少量附加说明的数据中自动构建一个东区,然后生成大量的意图探测和空档填充培训案例。四个数据集的实验结果表明,拟议的技术在准确性和通用能力方面大大超越了现有数据增强方法。