Personalized search plays a crucial role in improving user search experience owing to its ability to build user profiles based on historical behaviors. Previous studies have made great progress in extracting personal signals from the query log and learning user representations. However, neural personalized search is extremely dependent on sufficient data to train the user model. Data sparsity is an inevitable challenge for existing methods to learn high-quality user representations. Moreover, the overemphasis on final ranking quality leads to rough data representations and impairs the generalizability of the model. To tackle these issues, we propose a Personalized Search framework with Self-supervised Learning (PSSL) to enhance data representations. Specifically, we adopt a contrastive sampling method to extract paired self-supervised information from sequences of user behaviors in query logs. Four auxiliary tasks are designed to pre-train the sentence encoder and the sequence encoder used in the ranking model. They are optimized by contrastive loss which aims to close the distance between similar user sequences, queries, and documents. Experimental results on two datasets demonstrate that our proposed model PSSL achieves state-of-the-art performance compared with existing baselines.
翻译:个人化搜索在改善用户搜索经验方面发挥着关键作用,因为它能够根据历史行为建立用户概况。以前的研究在从查询日志和学习用户演示中提取个人信号方面取得了巨大进展。然而,神经个性化搜索极其依赖足够的数据来培训用户模型。数据宽度是现有方法学习高质量用户表达方式的一个不可避免的挑战。此外,过度强调最后排序质量导致数据表述粗糙,并损害模型的通用性。为了解决这些问题,我们提议了一个由自监督学习(PSSL)组成的个人化搜索框架,以加强数据表达方式。具体地说,我们采用对比抽样抽样方法,从查询日志中的用户行为序列中提取配对自监督的信息。设计了四项辅助任务,对句码编码器和排序模型中使用的序列编码器进行预调。它们被对比性损失优化,目的是缩短类似用户序列、查询和文件之间的距离。两个数据集的实验结果表明,我们提议的模型PSSL实现了与现有基线的状态比较。