Word Sense Disambiguation (WSD) is an NLP task aimed at determining the correct sense of a word in a sentence from discrete sense choices. Although current systems have attained unprecedented performances for such tasks, the nonuniform distribution of word senses during training generally results in systems performing poorly on rare senses. To this end, we consider data augmentation to increase the frequency of these least frequent senses (LFS) to reduce the distributional bias of senses during training. We propose Sense-Maintained Sentence Mixup (SMSMix), a novel word-level mixup method that maintains the sense of a target word. SMSMix smoothly blends two sentences using mask prediction while preserving the relevant span determined by saliency scores to maintain a specific word's sense. To the best of our knowledge, this is the first attempt to apply mixup in NLP while preserving the meaning of a specific word. With extensive experiments, we validate that our augmentation method can effectively give more information about rare senses during training with maintained target sense label.
翻译:Wordense Disanderation (WSD) 是一项NLP任务,旨在从离散感的选项中确定一个词的正确感知。 虽然当前系统已经为这些任务取得了前所未有的表现, 但培训期间单词感的不统一分布通常导致系统在稀有感知方面表现不佳。 为此,我们认为数据增强是为了增加这些最不常见感(LFS)的频率,以减少培训期间感知的分布偏差。 我们提出了Sense-Mainedal Page Mixup (SMSMix) (SMSSMix), 这是一种新颖的单词级混和方法, 维持目标字感。 SMSMix在使用掩码预测同时将两个句相混合, 同时保留由突出分数决定的相关范围, 以保持特定的感知力。 对于我们的知识来说, 这是第一次尝试在NLP(LP) 中应用混杂, 同时保留特定词的含义。 通过广泛的实验, 我们验证我们的扩增方法能够有效地在训练期间以维持目标感标签来提供关于稀有感的稀有感的信息。