Acronym disambiguation means finding the correct meaning of an ambiguous acronym from the dictionary in a given sentence, which is one of the key points for scientific document understanding (SDU@AAAI-22). Recently, many attempts have tried to solve this problem via fine-tuning the pre-trained masked language models (MLMs) in order to obtain a better acronym representation. However, the acronym meaning is varied under different contexts, whose corresponding phrase representation mapped in different directions lacks discrimination in the entire vector space. Thus, the original representations of the pre-trained MLMs are not ideal for the acronym disambiguation task. In this paper, we propose a Simple framework for Contrastive Learning of Acronym Disambiguation (SimCLAD) method to better understand the acronym meanings. Specifically, we design a continual contrastive pre-training method that enhances the pre-trained model's generalization ability by learning the phrase-level contrastive distributions between true meaning and ambiguous phrases. The results on the acronym disambiguation of the scientific domain in English show that the proposed method outperforms all other competitive state-of-the-art (SOTA) methods.
翻译:缩略语模糊不清意味着在某一句子(科学文件理解的关键要点之一)中找到词典中词典中模糊缩略语的正确含义(SDU@AAAI-22)。最近,许多尝试都试图通过微调预先训练的蒙面语言模型(MLMs)来解决这一问题,以获得更好的缩略语代表;然而,在不同的背景下,缩略语含义各有不同,按不同方向绘制的缩略语在整个矢量空间上没有区别。因此,预先训练的MLMs的最初表述对于缩略语混淆任务并不理想。在本文件中,我们提议了一个简单化的Acronym Disambigiation(SimCLAD)的对比学习框架,以更好地理解缩略语的含义。具体地说,我们设计了一个持续的对比性培训前方法,通过学习在语系和模糊的词句之间的词级对比性分布,从而增强培训前模式的普及能力。英语科学域缩略语的缩略语表达结果显示,拟议的方法超越了所有其他竞争性状态-艺术方法。