电子商务产品检索与视觉语言预训练 (Delving into E-Commerce Product Retrieval with Vision-Language Pre-training) - 专知论文

会员服务 ·

0

视觉语言预训练 · 电子商务产品 · 电子商务 · 预训练 · 淘宝搜索 ·

2023 年 4 月 10 日

Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

翻译：电子商务产品检索与视觉语言预训练

Xiaoyang Zheng,Fuyu Lv,Zilong Wang,Qingwen Liu,Xiaoyi Zeng

from arxiv, 5 pages, 4 figures, accepted to SIRIP 2023

E-commerce search engines comprise a retrieval phase and a ranking phase, where the first one returns a candidate product set given user queries. Recently, vision-language pre-training, combining textual information with visual clues, has been popular in the application of retrieval tasks. In this paper, we propose a novel V+L pre-training method to solve the retrieval problem in Taobao Search. We design a visual pre-training task based on contrastive learning, outperforming common regression-based visual pre-training tasks. In addition, we adopt two negative sampling schemes, tailored for the large-scale retrieval task. Besides, we introduce the details of the online deployment of our proposed method in real-world situations. Extensive offline/online experiments demonstrate the superior performance of our method on the retrieval task. Our proposed method is employed as one retrieval channel of Taobao Search and serves hundreds of millions of users in real time.

翻译：电子商务搜索引擎包括检索阶段和排名阶段，其中前者会根据用户查询返回候选产品集合。最近，将文本信息与视觉线索相结合的视觉语言预训练方法，在检索任务中得到广泛应用。本文提出了一种新颖的基于V+L预训练的方法，以解决淘宝搜索引擎中的检索问题。我们设计了一个基于对比学习的视觉预训练任务，优于常见的基于回归的视觉预训练任务。此外，我们采用了两种专为大规模检索任务量身定制的负采样方案。除此之外，我们介绍了我们的方法在实际情况下的在线部署细节。大量的离线/在线实验表明了我们的方法在检索任务中的卓越性能。我们的方法被用作淘宝搜索的一个检索通道并且能在实时中为数亿用户服务。

0

相关内容

视觉语言预训练

视觉语言预训练

【AAAI2023】基于检索增强语言模型的高效可扩展NLP，72页ppt

【AAAI2023】基于检索增强语言模型的高效可扩展NLP，72页ppt

专知会员服务

57+阅读 · 2023年2月20日

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

专知会员服务

21+阅读 · 2022年4月21日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

专知会员服务

96+阅读 · 2021年10月1日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

专知会员服务

44+阅读 · 2020年5月3日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

专知

15+阅读 · 2018年6月15日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

云计算中高效安全外包计算协议的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Ontology的藏文语料库检索关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

通用Web结构化信息检索引擎的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

语义驱动的个性化虚拟人重建技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

适应云计算环境的视频编码、传输与智能处理

国家自然科学基金

0+阅读 · 2011年12月31日

三维模型的分析与检索研究

国家自然科学基金

1+阅读 · 2010年12月31日

三维模型语义分析与检索研究

国家自然科学基金

2+阅读 · 2008年12月31日

面向查询的XML文本自动文摘研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于双语文档反馈的跨语言信息检索研究

国家自然科学基金

0+阅读 · 2008年12月31日

Visually-augmented pretrained language models for NLP tasks without images

Arxiv

0+阅读 · 2023年5月26日

Multiview Identifiers Enhanced Generative Retrieval

Arxiv

0+阅读 · 2023年5月26日

ConvGQR: Generative Query Reformulation for Conversational Search

ConvGQR: Generative Query Reformulation for Conversational Search

Arxiv

0+阅读 · 2023年5月26日

Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval

Arxiv

0+阅读 · 2023年5月24日

Decomposing Complex Queries for Tip-of-the-tongue Retrieval

Arxiv

0+阅读 · 2023年5月24日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

VIP会员

文章信息

相关主题

视觉语言预训练

电子商务产品

相关VIP内容

【AAAI2023】基于检索增强语言模型的高效可扩展NLP，72页ppt

【AAAI2023】基于检索增强语言模型的高效可扩展NLP，72页ppt

专知会员服务

57+阅读 · 2023年2月20日

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

专知会员服务

21+阅读 · 2022年4月21日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

专知会员服务

96+阅读 · 2021年10月1日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

专知会员服务

44+阅读 · 2020年5月3日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】多目标奖励与偏好优化：理论与算法

《无形的防御者？将定向能武器集成到反无人机框架的机遇与挑战》报告

自主化海军：海上无人系统与未来海战

迈向智能体系统规模化的科学

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

专知

15+阅读 · 2018年6月15日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

相关论文

Visually-augmented pretrained language models for NLP tasks without images

Arxiv

0+阅读 · 2023年5月26日

Multiview Identifiers Enhanced Generative Retrieval

Arxiv

0+阅读 · 2023年5月26日

ConvGQR: Generative Query Reformulation for Conversational Search

ConvGQR: Generative Query Reformulation for Conversational Search

Arxiv

0+阅读 · 2023年5月26日

Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval

Arxiv

0+阅读 · 2023年5月24日

Decomposing Complex Queries for Tip-of-the-tongue Retrieval

Arxiv

0+阅读 · 2023年5月24日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

相关基金

云计算中高效安全外包计算协议的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Ontology的藏文语料库检索关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

通用Web结构化信息检索引擎的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

语义驱动的个性化虚拟人重建技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

适应云计算环境的视频编码、传输与智能处理

国家自然科学基金

0+阅读 · 2011年12月31日

三维模型的分析与检索研究

国家自然科学基金

1+阅读 · 2010年12月31日

三维模型语义分析与检索研究

国家自然科学基金

2+阅读 · 2008年12月31日

面向查询的XML文本自动文摘研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于双语文档反馈的跨语言信息检索研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员