在神经相似度下近近邻搜索 (Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation)

Model-based methods for recommender systems have been studied extensively for years. Modern recommender systems usually resort to 1) representation learning models which define user-item preference as the distance between their embedding representations, and 2) embedding-based Approximate Nearest Neighbor (ANN) search to tackle the efficiency problem introduced by large-scale corpus. While providing efficient retrieval, the embedding-based retrieval pattern also limits the model capacity since the form of user-item preference measure is restricted to the distance between their embedding representations. However, for other more precise user-item preference measures, e.g., preference scores directly derived from a deep neural network, they are computationally intractable because of the lack of an efficient retrieval method, and an exhaustive search for all user-item pairs is impractical. In this paper, we propose a novel method to extend ANN search to arbitrary matching functions, e.g., a deep neural network. Our main idea is to perform a greedy walk with a matching function in a similarity graph constructed from all items. To solve the problem that the similarity measures of graph construction and user-item matching function are heterogeneous, we propose a pluggable adversarial training task to ensure the graph search with arbitrary matching function can achieve fairly high precision. Experimental results in both open source and industry datasets demonstrate the effectiveness of our method. The proposed method has been fully deployed in the Taobao display advertising platform and brings a considerable advertising revenue increase. We also summarize our detailed experiences in deployment in this paper.

翻译：多年来,对基于建议人的系统采用基于模型的方法进行了广泛研究。现代建议系统通常采用以下方法:(1) 代表学习模式,将用户项目偏好定义为其嵌入式代表之间的距离;和(2) 嵌入式近邻邻居(ANN)搜索,以解决大规模实体带来的效率问题。在提供高效检索的同时,基于嵌入式检索模式也限制了模型能力,因为用户项目偏好措施的形式仅限于其嵌入式代表之间的距离。然而,对于其他更精确的用户项目偏好措施,例如,从深神经网络直接得出偏好分,这些模式在计算上很难,因为缺乏高效检索方法,而且对所有用户项目配对进行彻底搜索是不切实际的。在本文中,我们提出了一个新颖的方法,将ANN搜索扩展为任意的匹配功能,例如深线性线性网络。我们的主要想法是在所有项目的嵌入式表达式代表器中进行贪和匹配功能。为了解决以下问题,即纸质构造和用户项目匹配功能的类似性计量标准是直接的,因为缺乏高效的检索方法,因此对所有用户项目来说,我们所部署的纸质选择了一种高端搜索方法,我们所部署的纸质搜索工具能够以直观性地将高端端端端化地将高端数据显示我们所部署的平压式搜索方法实现。我们所部署的平面性平压性平面性平面性平质性平压式计算。