Measuring Sentence Textual Similarity (STS) is a classic task that can be applied to many downstream NLP applications such as text generation and retrieval. In this paper, we focus on unsupervised STS that works on various domains but only requires minimal data and computational resources. Theoretically, we propose a light-weighted Expectation-Correction (EC) formulation for STS computation. EC formulation unifies unsupervised STS approaches including the cosine similarity of Additively Composed (AC) sentence embeddings, Optimal Transport (OT), and Tree Kernels (TK). Moreover, we propose the Recursive Optimal Transport Similarity (ROTS) algorithm to capture the compositional phrase semantics by composing multiple recursive EC formulations. ROTS finishes in linear time and is faster than its predecessors. ROTS is empirically more effective and scalable than previous approaches. Extensive experiments on 29 STS tasks under various settings show the clear advantage of ROTS over existing approaches. Detailed ablation studies demonstrate the effectiveness of our approaches.
翻译:测算句文本相似性(STS)是一项经典任务,可以适用于许多下游NLP应用程序,如文本生成和检索。在本文中,我们侧重于在多个领域工作但只需要最低数据和计算资源的不受监督的STS。理论上,我们建议为STS计算采用轻量加权期望校正(EC)公式。EC的配方统一了未经监督的STS方法,包括沉积组合(AC)句嵌入的共性(OC)、最佳运输(OT)和树角(TK)。此外,我们建议采用精度优化运输相似性(ROTS)算法,通过组合多个递归的EC配方来捕捉成构的语义语。ROTS在线性时间完成,比其前身更快。ROTS比以往的实验性更有效,而且可以缩放。在各种环境下对29项STS任务进行的大规模实验表明ROTS相对于现有方法的明显优势。详细分析研究表明,我们的方法的有效性。