Synonymous keyword retrieval has become an important problem for sponsored search ever since major search engines relax the exact match product's matching requirement to a synonymous level. Since the synonymous relations between queries and keywords are quite scarce, the traditional information retrieval framework is inefficient in this scenario. In this paper, we propose a novel quotient space-based retrieval framework to address this problem. Considering the synonymy among keywords as a mathematical equivalence relation, we can compress the synonymous keywords into one representative, and the corresponding quotient space would greatly reduce the size of the keyword repository. Then an embedding-based retrieval is directly conducted between queries and the keyword representatives. To mitigate the semantic gap of the quotient space-based retrieval, a single semantic siamese model is utilized to detect both the keyword--keyword and query-keyword synonymous relations. The experiments show that with our quotient space-based retrieval method, the synonymous keyword retrieving performance can be greatly improved in terms of memory cost and recall efficiency. This method has been successfully implemented in Baidu's online sponsored search system and has yielded a significant improvement in revenue.
翻译:自主要搜索引擎放松了准确匹配产品匹配要求与同义级别后,同义关键字检索已成为受赞助搜索的一个重要问题。由于查询和关键字之间的同义关系相当稀少,传统的信息检索框架在这一情景中效率低下。在本文中,我们提出一个新的空基搜索框架以解决这一问题。考虑到关键字之间的同义关系是一种数学等同关系,我们可以将同义关键字压缩为一名代表,相应的商数空间将大大降低关键字存储器的大小。然后,在查询和关键字代表之间直接进行嵌入式检索。为了缩小空基检索的语义差距,使用了单一语义学类模型来探测关键字和关键词同义关系。实验显示,用我们空基的同义检索方法,同义关键字重写功能在记忆成本和回顾效率方面可以大大改进。这种方法在Baidu的在线支持搜索系统中已经成功实施,并取得了收入的重大改进。