Contextual word representation models have shown massive improvements on a multitude of NLP tasks, yet their word sense disambiguation capabilities remain poorly explained. To address this gap, we assess whether contextual word representations extracted from deep pretrained language models create distinguishable representations for different senses of a given word. We analyze the representation geometry and find that most layers of deep pretrained language models create highly anisotropic representations, pointing towards the existence of representation degeneration problem in contextual word representations. After accounting for anisotropy, our study further reveals that there is variability in sense learning capabilities across different language models. Finally, we propose LASeR, a 'Low Anisotropy Sense Retrofitting' approach that renders off-the-shelf representations isotropic and semantically more meaningful, resolving the representation degeneration problem as a post-processing step, and conducting sense-enrichment of contextualized representations extracted from deep neural language models.
翻译:上层语言表达模型显示,在大量NLP任务上,上层语言表达模式显示出了巨大的改善,然而,他们的字感模糊能力仍然没有得到很好的解释。为了解决这一差距,我们评估了从深层预先培训的语言模式中提取的上层语言表达方式是否为某一特定词的不同感知提供了可辨别的表达方式。我们分析了代表性的几何方法,发现大多数深层未经培训的语言模式产生了高度的厌食性表达方式,指出在上层语言表达方式中存在着代代代代代相传的问题。在计算了厌食症后,我们的研究进一步揭示了不同语言模式之间在感知上存在差异性学习能力。最后,我们提出了“LASeR”方法,即“Low anisotropy Sense Retrofect ”方法,该方法使现代代代代代代代代代言问题作为后处理步骤加以解决,并对从深层神经语言模式中提取的上层语言表达方式进行感知丰富。