分析和评价 " 语言意识 " 扭曲语言模式的分析和评价 (Analysis and Evaluation of Language Models for Word Sense Disambiguation)

Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations in encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, the simple feature extraction strategy of averaging contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements obtained by increasing the size of this training data.

翻译：以变异器为基础的语言模型在NLP中占据了许多领域。 BERT及其衍生物由于能够捕捉到对背景敏感的语义差异,因此占了现有评价基准的大多数领域,包括Word Sense Disamdiguation(WSD)的参数。然而,对于它们的能力和在编码和恢复文字感学方面的潜在限制,仍知之甚少。在本篇文章中,我们对著名的BERT模型在法律模糊性方面进行深入的定量和定性分析。我们分析的主要结论之一是,即使每个字义都有有限的例子,BERT也可以准确捕捉到高层次的感性区别。我们的分析还表明,在某些情况下,语言模型在理想条件下,在提供培训数据和计算资源方面,接近于解决粗略的无语脱脱脱脱。然而,这一情景很少发生在现实世界环境中,因此,甚至在粗略的环境下,许多实际挑战仍然存在。我们分析的主要结论之一是,即使每个字义都对基于WSD的两种主要语言模型战略进行深入的比较,例如,精确的调整和特征提取。我们的分析还表明,在某些情况下,采用更稳健健健健的学习的BEBIalalim Arial Instrationalimalalalalisalismisal 。