This paper describes SChME (Semantic Change Detection with Model Ensemble), a method usedin SemEval-2020 Task 1 on unsupervised detection of lexical semantic change. SChME usesa model ensemble combining signals of distributional models (word embeddings) and wordfrequency models where each model casts a vote indicating the probability that a word sufferedsemantic change according to that feature. More specifically, we combine cosine distance of wordvectors combined with a neighborhood-based metric we named Mapped Neighborhood Distance(MAP), and a word frequency differential metric as input signals to our model. Additionally,we explore alignment-based methods to investigate the importance of the landmarks used in thisprocess. Our results show evidence that the number of landmarks used for alignment has a directimpact on the predictive performance of the model. Moreover, we show that languages that sufferless semantic change tend to benefit from using a large number of landmarks, whereas languageswith more semantic change benefit from a more careful choice of landmark number for alignment.
翻译:本文描述了 SCHME (SemeEval-2020任务1 中用于不受监督地检测词汇语义变化的方法。 SchME 使用模型,将分布模型(字嵌入器)和字频模型的信号结合起来,其中每个模型投出一票,表明一个单词根据该特征发生语义变化的概率。更具体地说,我们结合了文字变量的焦距,加上一个基于邻居的衡量标准,我们称为Mapped Neighborhood Learth(MAP),以及一个字频差指标,作为我们模型的输入信号。此外,我们探索了基于校正法的方法,以调查在这一过程中使用的地标的重要性。我们的结果显示,用于校正的地标数对模型的预测性能有直接影响。此外,我们表明,不遭受语义变化的语文往往受益于大量地标,而语言的语义变化则得益于更谨慎地选择校准的地标号。