This paper introduces the large scale visual search algorithm and system infrastructure at Alibaba. The following challenges are discussed under the E-commercial circumstance at Alibaba (a) how to handle heterogeneous image data and bridge the gap between real-shot images from user query and the online images. (b) how to deal with large scale indexing for massive updating data. (c) how to train deep models for effective feature representation without huge human annotations. (d) how to improve the user engagement by considering the quality of the content. We take advantage of large image collection of Alibaba and state-of-the-art deep learning techniques to perform visual search at scale. We present solutions and implementation details to overcome those problems and also share our learnings from building such a large scale commercial visual search engine. Specifically, model and search-based fusion approach is introduced to effectively predict categories. Also, we propose a deep CNN model for joint detection and feature learning by mining user click behavior. The binary index engine is designed to scale up indexing without compromising recall and precision. Finally, we apply all the stages into an end-to-end system architecture, which can simultaneously achieve highly efficient and scalable performance adapting to real-shot images. Extensive experiments demonstrate the advancement of each module in our system. We hope visual search at Alibaba becomes more widely incorporated into today's commercial applications.
翻译:本文介绍Alibaba的大规模视觉搜索算法和系统基础设施。以下挑战将在Alibaba的电子商业环境下讨论:(a) 如何处理各种图像数据,弥合用户查询和在线图像实际图像之间的差距;(b) 如何处理大规模更新数据的大规模指数化问题;(c) 如何在没有大量人文说明的情况下,为有效特征表现培训深层次模型;(d) 如何通过考虑内容质量来提高用户参与程度。我们利用Alibaba的大型图像收集以及最先进的深层次学习技术进行大规模视觉搜索。我们提出解决方案和实施细节,以克服这些问题,并分享我们从建立如此大规模商业视觉搜索引擎中获得的学习。具体地说,采用模型和基于搜索的聚合方法来有效预测各类数据。此外,我们提出一个深层次CNN模型,用于联合检测和通过采矿用户点击行为进行特征学习。二进式索引引擎旨在扩大索引的编制,同时不损害记忆和精确性能。最后,我们将所有阶段应用到终端至终端系统架构中,通过大规模搜索模型,我们可以在今天实现最高效和可扩展的搜索模型。