Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations for encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT captures high-level sense distinctions accurately, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, a simple feature extraction strategy based on the averaging of contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements beyond this small number of examples.
翻译:以变异器为基础的语言模型在NLP中占据了许多领域。 BERT及其衍生物由于能够捕捉到对背景敏感的语义差异,因而在大多数现有评价基准中占据了主导地位,包括Word Sense Disamdiguation(WSD),因为Word Sense Disamdiguation(WSD),由于它们有能力捕捉到对背景敏感的语义差异,然而,对于它们在编码和恢复文字感方面的能力和潜在限制,仍然知之甚少。但是,在本篇文章中,我们对著名的BERT模型在词汇模糊性方面进行深入的定量和定性分析。我们分析的主要结论之一是,BERT精确地捕捉了高层次的差别,即使每个字义都有有限的例子。我们的分析还表明,在某些情况下,语言模型在理想条件下,在提供培训数据和计算资源方面,几乎接近解决了粗略的无差别的词义脱混淆现象。然而,这一假设很少出现在粗略的环境下,因此许多实际挑战仍然存在。我们还对基于WSDD战略的两种主要语言模型进行了深入的比较,也就是说,即精确的调整和地和地地进行背景分析,并采用一种比较,然后采用一种较稳健健健健健的顺序的学习的顺序,然后采用一种方法。