Matching user search queries with relevant keywords bid by advertisers in real-time is a crucial problem in sponsored search. In the literature, two broad set of approaches have been explored to solve this problem: (i) Dense Retrieval (DR) - learning dense vector representations for queries and bid keywords in a shared space, and (ii) Natural Language Generation (NLG) - learning to directly generate bid keywords given queries. In this work, we first conduct an empirical study of these two approaches and show that they offer complementary benefits that are additive. In particular, a large fraction of the keywords retrieved from NLG haven't been retrieved by DR and vice-versa. We then show that it is possible to effectively combine the advantages of these two approaches in one model. Specifically, we propose HEARTS: a novel multi-task fusion framework where we jointly optimize a shared encoder to perform both DR and non-autoregressive NLG. Through extensive experiments on search queries from over 30+ countries spanning 20+ languages, we show that HEARTS retrieves 40.3% more high-quality bid keywords than the baseline approaches with the same GPU compute. We also demonstrate that inferring on a single HEARTS model is as good as inferring on two different DR and NLG baseline models, with 2x the compute. Further, we show that DR models trained with the HEARTS objective are significantly better than those trained with the standard contrastive loss functions. Finally, we show that our HEARTS objective can be adopted to short-text retrieval tasks other than sponsored search and achieve significant performance gains.
翻译:将用户搜索询问与广告商实时相关关键字标出的相关关键字匹配用户搜索查询是赞助搜索中的一个关键问题。 在文献中,为解决这一问题探索了两种广泛的方法:(一) ense Retreireval (DR) - 学习共享空间查询和标语关键字的密集矢量代表,以及(二) 自然语言生成(NLG) - 学习直接生成标语关键字,在这项工作中,我们首先对这两种方法进行经验性研究,并表明它们具有补充性的互补效益。特别是,从NLG has' 调出的关键字中有很大一部分已被DR和反反向读取回。然后我们表明,将这两种方法的优势有效地结合到一个模式。具体地说,我们建议HPTS:一个全新的多任务融合框架,在这个框架里,我们共同优化一个共同的编码来进行DR和不向下向下向下向下移动的关键字。通过30多个国家进行广泛的搜索实验,在20+语言上,我们表明,从NLGTS回收到40.3%以上高质量的标码的标码检索功能,我们用了两个标定的标值标定了不同的基准显示,在不同的DREVDRBRB的成绩上也显示不同的标本。