An important question concerning contextualized word embedding (CWE) models like BERT is how well they can represent different word senses, especially those in the long tail of uncommon senses. Rather than build a WSD system as in previous work, we investigate contextualized embedding neighborhoods directly, formulating a query-by-example nearest neighbor retrieval task and examining ranking performance for words and senses in different frequency bands. In an evaluation on two English sense-annotated corpora, we find that several popular CWE models all outperform a random baseline even for proportionally rare senses, without explicit sense supervision. However, performance varies considerably even among models with similar architectures and pretraining regimes, with especially large differences for rare word senses, revealing that CWE models are not all created equal when it comes to approximating word senses in their native representations.
翻译:有关BERT(CWE)等背景化字嵌入模型的一个重要问题是,它们能代表不同的字感,特别是非正常感知长尾的字感。 我们不是像以往工作那样直接调查背景化的字嵌入区,而是直接调查背景化的字嵌入区,制定逐例的近邻检索任务,并审查不同频率带的字词和感知的排序性能。 在对两种英语感知附加说明的COFora的评估中,我们发现一些流行的CWE模型都优于随机基线,即使是比例化的稀有感知,也没有明确的感官监督。 然而,即使具有类似结构和预培训制度的模型之间,业绩也有很大差异,对于稀有字感而言,特别大的差异,表明CWE模型在与本地表达方式的字感知相近时并非都是平等的。