Semantic Shift Detection (SSD) is the task of identifying, interpreting, and assessing the possible change over time in the meanings of a target word. Traditionally, SSD has been addressed by linguists and social scientists through manual and time-consuming activities. In the recent years, computational approaches based on Natural Language Processing and word embeddings gained increasing attention to automate SSD as much as possible. In particular, over the past three years, significant advancements have been made almost exclusively based on word contextualised embedding models, which can handle the multiple usages/meanings of the words and better capture the related semantic shifts. In this paper, we survey the approaches based on contextualised embeddings for SSD (i.e., CSSDetection) and we propose a classification framework characterised by meaning representation, time-awareness, and learning modality dimensions. The framework is exploited i) to review the measures for shift assessment, ii) to compare the approaches on performance, and iii) to discuss the current issues in terms of scalability, interpretability, and robustness. Open challenges and future research directions about CSSDetection are finally outlined.
翻译:语义漂移检测 (SSD) 是识别、解释和评估目标单词含义可能随时间变化的任务。传统上,SSD 通过手动和费时的方法由语言学家和社会科学家来处理。近年来,基于自然语言处理和词嵌入的计算方法引起了越来越多的关注,以尽可能自动化 SSD。特别地,过去三年中,几乎完全基于上下文嵌入模型的计算方法取得了显著进展,该方法可以处理词语的多种用法/含义,并更好地捕捉相关的语义漂移。本文综述了基于上下文嵌入的 SSD 方法 (即 CSSDetection),并提出了一个以意义表示、时间感知和学习模式三个维度为特征的分类框架。该框架被应用来:i) 回顾衡量漂移的度量方法,ii) 比较计算方法的性能,并 iii) 讨论当前的可扩展性、可解释性和健壮性问题。最后,本文概述了 CSSDetection 的挑战和未来研究方向。