Product Retrieval (PR) and Grounding (PG), aiming to seek image and object-level products respectively according to a textual query, have attracted great interest recently for better shopping experience. Owing to the lack of relevant datasets, we collect two large-scale benchmark datasets from Taobao Mall and Live domains with about 474k and 101k image-query pairs for PR, and manually annotate the object bounding boxes in each image for PG. As annotating boxes is expensive and time-consuming, we attempt to transfer knowledge from annotated domain to unannotated for PG to achieve un-supervised Domain Adaptation (PG-DA). We propose a {\bf D}omain {\bf A}daptive Produc{\bf t} S{\bf e}eker ({\bf DATE}) framework, regarding PR and PG as Product Seeking problem at different levels, to assist the query {\bf date} the product. Concretely, we first design a semantics-aggregated feature extractor for each modality to obtain concentrated and comprehensive features for following efficient retrieval and fine-grained grounding tasks. Then, we present two cooperative seekers to simultaneously search the image for PR and localize the product for PG. Besides, we devise a domain aligner for PG-DA to alleviate uni-modal marginal and multi-modal conditional distribution shift between source and target domains, and design a pseudo box generator to dynamically select reliable instances and generate bounding boxes for further knowledge transfer. Extensive experiments show that our DATE achieves satisfactory performance in fully-supervised PR, PG and un-supervised PG-DA. Our desensitized datasets will be publicly available here\footnote{\url{https://github.com/Taobao-live/Product-Seeking}}.
翻译:产品检索和物体定位的目标是帮助用户根据文本查询查询图像和目标级别的产品,以提供更好的购物体验。最近由于缺乏相关数据集,我们从淘宝商城和直播两个领域收集了两个大规模的基准数据集,分别包含大约 47.4 万和 10.1 万个图像 - 查询对,用于产品检索,并为物体定位手动注释每个图像中的物体边界框。由于标注边界框是昂贵和耗时的,我们试图从带注释的领域向未标注的领域传递知识来实现无监督域自适应。我们提出了一个“ 领域自适应产品搜索器(Domain Adaptive Product Seeker,DATE)”框架,将产品检索问题视为不同级别的产品搜索问题,以帮助查询“约会”产品。具体而言,我们首先为每种模态设计语义聚合特征提取器,以获取集中和全面的特征,以进行高效检索和细粒度定位任务。然后,我们提出了两种协同搜索器,同时检索图像进行产品检索并定位产品进行物体定位。此外,我们为 PG-DA 设计了一个领域对齐器,以减轻源域和目标域之间的单模态边缘和多模态条件分布偏移,并设计了一个伪框生成器,以动态选择可靠的实例并生成边界框以进行进一步的知识转移。广泛的实验表明,我们的 DATE 在完全监督的产品检索、物体定位和无监督的 PG-DA 中均具有令人满意的性能。我们的数据集将在此公开\footnote{\url{https://github.com/Taobao-live/Product-Seeking}}。