Existing methods to measure sentence similarity are faced with two challenges: (1) labeled datasets are usually limited in size, making them insufficient to train supervised neural models; (2) there is a training-test gap for unsupervised language modeling (LM) based models to compute semantic scores between sentences, since sentence-level semantics are not explicitly modeled at training. This results in inferior performances in this task. In this work, we propose a new framework to address these two issues. The proposed framework is based on the core idea that the meaning of a sentence should be defined by its contexts, and that sentence similarity can be measured by comparing the probabilities of generating two sentences given the same context. The proposed framework is able to generate high-quality, large-scale dataset with semantic similarity scores between two sentences in an unsupervised manner, with which the train-test gap can be largely bridged. Extensive experiments show that the proposed framework achieves significant performance boosts over existing baselines under both the supervised and unsupervised settings across different datasets.
翻译:衡量判决相似性的现有方法面临两个挑战:(1) 标签数据集通常规模有限,不足以培训受监督的神经模型;(2) 以未经监督的语言建模模型为基础的模型存在培训测试差距,以计算两句之间的语义分数,因为没有在培训中明确模拟判决等级语义,这导致这项任务的绩效低下。在这项工作中,我们提出了解决这两个问题的新框架。拟议框架基于以下核心理念:一个句子的含义应根据其背景加以界定,而通过比较在同一背景下生成两句的概率,可以衡量句子的相似性。拟议框架能够以不受监督的方式产生高质量的大尺度数据,在两句子之间产生类似语义分数,使火车测试差距在很大程度上可以弥合。广泛的实验表明,拟议框架在不同数据集的受监督和未经监督的环境中,在现有的基线上取得了显著的性增强。