Word sense disambiguation (WSD) is a long-standing problem in natural language processing. One significant challenge in supervised all-words WSD is to classify among senses for a majority of words that lie in the long-tail distribution. For instance, 84% of the annotated words have less than 10 examples in the SemCor training data. This issue is more pronounced as the imbalance occurs in both word and sense distributions. In this work, we propose MetricWSD, a non-parametric few-shot learning approach to mitigate this data imbalance issue. By learning to compute distances among the senses of a given word through episodic training, MetricWSD transfers knowledge (a learned metric space) from high-frequency words to infrequent ones. MetricWSD constructs the training episodes tailored to word frequencies and explicitly addresses the problem of the skewed distribution, as opposed to mixing all the words trained with parametric models in previous work. Without resorting to any lexical resources, MetricWSD obtains strong performance against parametric alternatives, achieving a 75.1 F1 score on the unified WSD evaluation benchmark (Raganato et al., 2017b). Our analysis further validates that infrequent words and senses enjoy significant improvement.
翻译:在自然语言处理中,单词偏差是一个长期存在的问题。在受监督的全词 WSD 中,一个重大挑战是将属于长尾分发的大多数单词在感官中进行分类。例如,在SemCor培训数据中,84%附加说明的单词有不到10个实例。由于在文字和感量分布上都存在不平衡现象,这一问题更加明显。在这项工作中,我们提议MetricWSD是一种非参数的少见学习方法,以缓解这一数据不平衡问题。MetricWSD通过散列培训,将某个单词感知之间的距离进行计算,将知识(一个学到的计量空间)从高频数的单词转移到非常数的单词。MetrisWSD 构建了适应文字频率的培训课程,并明确解决了偏斜分布问题,而不是将以往工作中所训练的所有单词与参数模型混为一谈。MetricWSD在不使用任何词汇资源的情况下,取得了很强的成绩。MetricriskWSD在参数替代品上取得了75.1 F1分数的成绩,在统一的WSD评估基准(Risalviews)上取得了重大的改进。