Same-style products retrieval plays an important role in e-commerce platforms, aiming to identify the same products which may have different text descriptions or images. It can be used for similar products retrieval from different suppliers or duplicate products detection of one supplier. Common methods use the image as the detected object, but they only consider the visual features and overlook the attribute information contained in the textual descriptions, and perform weakly for products in image less important industries like machinery, hardware tools and electronic component, even if an additional text matching module is added. In this paper, we propose a unified vision-language modeling method for e-commerce same-style products retrieval, which is designed to represent one product with its textual descriptions and visual contents. It contains one sampling skill to collect positive pairs from user click log with category and relevance constrained, and a novel contrastive loss unit to model the image, text, and image+text representations into one joint embedding space. It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search. Offline evaluations on annotated data demonstrate its superior retrieval performance, and online testings show it can attract more clicks and conversions. Moreover, this model has already been deployed online for similar products retrieval in alibaba.com, the largest B2B e-commerce platform in the world.
翻译:同类产品检索在电子商务平台中起着重要作用,目的是确定可能具有不同文本描述或图像的相同产品,可用于不同供应商的类似产品检索或对一个供应商的重复产品检测。通用方法使用图像作为检测对象,但通常的方法只是将图像作为检测对象使用,忽视文本描述中包含的属性信息,对像素描述中包含的属性信息不甚重视,对像机械、硬件工具和电子组件这样不太重要的图像行业的产品,即使添加了额外的文本匹配模块,也表现不力。在本文中,我们提议了一种统一的电子商务同类产品检索的视觉语言模型方法,设计该方法是为了代表一种带有文本描述和视觉内容的产品。该方法包含一种从用户点击记录中采集正对正对的取样技能,而其类别和相关性受限制,以及一个全新的对比损失单位,将图像+文本显示成一个联合嵌入空间的图像、文本和图像+文本展示。它能够跨模式产品对产品进行检索,以及风格传输和用户互动搜索。在附加注释的数据上显示其高级的检索性表现,而在线测试则显示它能够吸引更多的电子检索。