We present two supervised (pre-)training methods to incorporate gloss definitions from lexical resources into neural language models (LMs). The training improves our models' performance for Word Sense Disambiguation (WSD) but also benefits general language understanding tasks while adding almost no parameters. We evaluate our techniques with seven different neural LMs and find that XLNet is more suitable for WSD than BERT. Our best-performing methods exceeds state-of-the-art WSD techniques on the SemCor 3.0 dataset by 0.5% F1 and increase BERT's performance on the GLUE benchmark by 1.1% on average.
翻译:我们提出了两种有监督的(培训前)方法,将词汇资源中的遗漏定义纳入神经语言模型(LMs),培训提高了我们模型在Word Sense Disamdigution(WSD)方面的性能,但也有利于一般语言理解任务,但几乎没有增加任何参数。我们用7个不同的神经LMs来评估我们的技术,发现XLNet比BERT更适合WSD。我们的最佳性能方法超过了SemCor 3.0数据集的最新的WSD技术,增加了0.5%的F1,并且平均将BERT在GLUE基准上的性能增加了1.1%。