Negation is a common linguistic feature that is crucial in many language understanding tasks, yet it remains a hard problem due to diversity in its expression in different types of text. Recent work has shown that state-of-the-art NLP models underperform on samples containing negation in various tasks, and that negation detection models do not transfer well across domains. We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking, to better incorporate negation information into language models. Extensive experiments on common benchmarks show that our proposed approach improves negation detection performance and generalizability over the strong baseline NegBERT (Khandewal and Sawant, 2020).
翻译:偏差是常见的语言特征,在许多语言理解任务中至关重要,然而,由于不同类型文本的表达形式的多样性,它仍然是一个棘手的问题。最近的工作表明,在含有不同任务中否定的样本方面,最先进的NLP模型表现不佳,否定检测模型没有很好地跨越各个领域。我们提出了一个新的以否定为焦点的培训前战略,涉及有针对性的数据增加和否定掩码,以便更好地将否定信息纳入语言模型。关于共同基准的广泛实验表明,我们提出的方法提高了否定检测性能和对强有力的NegBERT基线(Khandewal和Sawant,2020年)。