Idiomatic expressions (IEs), characterized by their non-compositionality, are an important part of natural language. They have been a classical challenge to NLP, including pre-trained language models that drive today's state-of-the-art. Prior work has identified deficiencies in their contextualized representation stemming from the underlying compositional paradigm of representation. In this work, we take a first-principles approach to build idiomaticity into BART using an adapter as a lightweight non-compositional language expert trained on idiomatic sentences. The improved capability over baselines (e.g., BART) is seen via intrinsic and extrinsic methods, where idiom embeddings score 0.19 points higher in homogeneity score for embedding clustering, and up to 25% higher sequence accuracy on the idiom processing tasks of IE sense disambiguation and span detection.
翻译:以非组合性为特征的单词表达方式(IES)是自然语言的一个重要部分,是自然语言的一个重要部分,是国家语言平台的典型挑战,包括推动当今最新技术的预培训语言模型。先前的工作已经查明了其背景化代表形式因基本组成代表性模式而存在的缺陷。在这项工作中,我们首先采用原则方法,将本词性纳入BART, 使用一个适配器,作为在单词性方面受过训练的非组合性语言专家。通过内在和外在方法,可以看到基线能力(例如BART)的提高,在这些方法中,本词嵌入在嵌入集的同源性分数中得分高0.19分,在IE感学的分处理分解分解和跨度探测任务上达到25%的更高序列精度。