The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set of automatic and interpretable measures for assessing the characteristics of corpus-level semantic similarity metrics, allowing sensible comparison of their behavior. We demonstrate the effectiveness of our evaluation measures in capturing fundamental characteristics by evaluating them on a collection of classical and state-of-the-art metrics. Our measures revealed that recently-developed metrics are becoming better in identifying semantic distributional mismatch while classical metrics are more sensitive to perturbations in the surface text levels.
 翻译:在各种自然语言处理应用中,比较文本公司之间语义相似性的能力很重要,但是,评价这些指标的标准方法尚未确定。我们提出了一套自动和可解释的措施,用以评估物理层次语义相似性指标的特性,便于对其行为进行合理比较。我们通过对古典和最新指标集进行评估,表明我们评价措施在捕捉基本特征方面的效力。我们的措施显示,最近开发的指标在查明语义分布不匹配方面正在变得更好,而传统指标对地表文字水平的扰动比较敏感。