Semantic textual similarity is one of the open research challenges in the field of Natural Language Processing. Extensive research has been carried out in this field and near-perfect results are achieved by recent transformer-based models in existing benchmark datasets like the STS dataset and the SICK dataset. In this paper, we study the sentences in these datasets and analyze the sensitivity of various word embeddings with respect to the complexity of the sentences. We build a complex sentences dataset comprising of 50 sentence pairs with associated semantic similarity values provided by 15 human annotators. Readability analysis is performed to highlight the increase in complexity of the sentences in the existing benchmark datasets and those in the proposed dataset. Further, we perform a comparative analysis of the performance of various word embeddings and language models on the existing benchmark datasets and the proposed dataset. The results show the increase in complexity of the sentences has a significant impact on the performance of the embedding models resulting in a 10-20% decrease in Pearson's and Spearman's correlation.
翻译:语义文本相似性是自然语言处理领域的公开研究挑战之一。在这一领域进行了广泛的研究,最近基于变压器的模型在STS数据集和SICK数据集等现有基准数据集中取得了近乎完美的结果。在本文件中,我们研究了这些数据集中的句子,分析了各种字嵌入的敏感度,以了解判决的复杂性。我们建立了一个复杂的句子数据集,由50对同15名人类告发者提供的相关语义相似值相配的句子组成。进行了可读性分析,以突出现有基准数据集中和拟议数据集中句子复杂性的增加。此外,我们对现有基准数据集和拟议数据集中各种词嵌入和语言模型的性能进行了比较分析。结果显示,判决复杂性的增加对嵌入模型的性能产生了重大影响,导致Pearson和Spearman的关联性下降10-20%。