Semantic search is an important task which objective is to find the relevant index from a database for query. It requires a retrieval model that can properly learn the semantics of sentences. Transformer-based models are widely used as retrieval models due to their excellent ability to learn semantic representations. in the meantime, many regularization methods suitable for them have also been proposed. In this paper, we propose a new regularization method: Regularized Contrastive Learning, which can help transformer-based models to learn a better representation of sentences. It firstly augments several different semantic representations for every sentence, then take them into the contrastive objective as regulators. These contrastive regulators can overcome overfitting issues and alleviate the anisotropic problem. We firstly evaluate our approach on 7 semantic search benchmarks with the outperforming pre-trained model SRoBERTA. The results show that our method is more effective for learning a superior sentence representation. Then we evaluate our approach on 2 challenging FAQ datasets, Cough and Faqir, which have long query and index. The results of our experiments demonstrate that our method outperforms baseline methods.
翻译:语义搜索是一项重要任务,目标是从一个查询数据库中找到相关的索引。 它需要一个检索模型, 可以正确学习句子的语义学。 以变换器为基础的模型由于其学习语义学表现的出色能力, 被广泛用作检索模型。 同时, 也提出了许多适合他们的正规化方法。 在本文件中, 我们提出了一种新的正规化方法: 常规化的对立学习, 有助于变换器模型学习更好的句子表达方式。 它首先可以增加每个句子的几种不同的语义表达方式, 然后把它们变成对比性的目标。 这些对比型的监管机构可以克服过于适合的问题, 并缓解异质学问题。 我们首先评估我们7个语义搜索基准的方法, 并使用业绩优异的预培训模型 SRBERTA。 结果显示, 我们的方法对于学习优异的句表达方式更有效。 然后我们评估我们对两个挑战FAQ数据集、 Cough 和 Faqir 的方法, 的方法有很长的查询和索引。 我们的实验结果显示我们的方法超越了基线方法。