与违规学习一起不受监督的浓情信息检索 (Unsupervised Dense Information Retrieval with Contrastive Learning) - 专知论文

会员服务 ·

0

无监督 · contrastive · INFORMS · Learning · BM25 ·

2022 年 8 月 29 日

Unsupervised Dense Information Retrieval with Contrastive Learning

翻译：与违规学习一起不受监督的浓情信息检索

Gautier Izacard,Mathilde Caron,Lucas Hosseini,Sebastian Riedel,Piotr Bojanowski,Armand Joulin,Edouard Grave

Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised term-frequency methods such as BM25. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS~MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual transfer when fine-tuned on supervised English data only and evaluated on low resources language such as Swahili. We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching methods.

翻译：最近,信息检索出现了密集的检索者,利用神经网络,作为基于条件频率的经典稀疏方法的替代方法。这些模型在有大型培训数据集的情况下,在数据集和任务方面获得了最先进的结果。然而,这些模型没有很好地向没有培训数据的新应用程序转移,而且以未受监督的术语频率方法,如BM25等,其表现优于未受监督的远程检索者。在这项工作中,我们探索对比学习的局限性,以此作为培训不受监督的密集检索者的一种方法,并表明它导致各种检索环境中的强效性能。在BEI基准中,我们未经监督的模型在15个数据集中,11个实现了BM25的超常性能。在微调前,它们没有很好地用于培训新应用程序,有数千个内部实例,或者大型MS ~MARCO数据集,我们的对比模型导致BER基准的改进。最后,我们评估我们多语言检索方法的方法,其中培训数据比英语还要少,并且显示我们的方法在不易受监督的阿拉伯文跨度的跨比英语检索中,我们的方法导致强的BM25,在微的跨调的英版文件上,我们也显示我们模型能显示,作为不精细的英版文件。

0

相关内容

无监督

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

专知会员服务

29+阅读 · 2022年3月6日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

基于VIA族和IB族杂质深能级的硅亚带隙光谱响应机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

水体中典型碳纳米材料-内分泌干扰物复合污染的生物交互效应及机制

国家自然科学基金

0+阅读 · 2015年12月31日

WC-CoCr热喷涂过程界面润湿特性及微观机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

微纳结构材料气凝胶的传热机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

微/纳米结构羟基磷灰石调控水稻根系吸收Pb的微观机理

国家自然科学基金

0+阅读 · 2013年12月31日

土壤微生物-矿物微界面结合Pb的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

纳米粒子在复合物中分散性定量表征及与介电性关系

国家自然科学基金

0+阅读 · 2012年12月31日

转录协同因子SRC-3在细菌性肠炎中对肠道的保护作用及其机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

碳纳米材料-半导体量子点复合体系制备和光电性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Improving Contrastive Learning on Visually Homogeneous Mars Rover Images

Arxiv

0+阅读 · 2022年10月17日

Correlation between Alignment-Uniformity and Performance of Dense Contrastive Representations

Arxiv

0+阅读 · 2022年10月17日

Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

Arxiv

0+阅读 · 2022年10月14日

Searching for Better Database Queries in the Outputs of Semantic Parsers

Arxiv

0+阅读 · 2022年10月13日

RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval

Arxiv

0+阅读 · 2022年10月13日

Evaluating the Label Efficiency of Contrastive Self-Supervised Learning for Multi-Resolution Satellite Imagery

Arxiv

0+阅读 · 2022年10月13日

Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Arxiv

0+阅读 · 2022年10月12日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

专知会员服务

29+阅读 · 2022年3月6日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于大型语言模型的软件工程自动化研究》最新264页

《基于大型语言模型的信号处理管线研究：推进军事电子情报工作流程》最新76页

中文版 | 战争算法：生成式人工智能在战场的崛起

中文版《美国陆军：战术行为性远程医疗实施观察与建议》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

相关论文

Improving Contrastive Learning on Visually Homogeneous Mars Rover Images

Arxiv

0+阅读 · 2022年10月17日

Correlation between Alignment-Uniformity and Performance of Dense Contrastive Representations

Arxiv

0+阅读 · 2022年10月17日

Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

Arxiv

0+阅读 · 2022年10月14日

Searching for Better Database Queries in the Outputs of Semantic Parsers

Arxiv

0+阅读 · 2022年10月13日

RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval

Arxiv

0+阅读 · 2022年10月13日

Evaluating the Label Efficiency of Contrastive Self-Supervised Learning for Multi-Resolution Satellite Imagery

Arxiv

0+阅读 · 2022年10月13日

Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Arxiv

0+阅读 · 2022年10月12日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

相关基金

基于VIA族和IB族杂质深能级的硅亚带隙光谱响应机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

水体中典型碳纳米材料-内分泌干扰物复合污染的生物交互效应及机制

国家自然科学基金

0+阅读 · 2015年12月31日

WC-CoCr热喷涂过程界面润湿特性及微观机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

微纳结构材料气凝胶的传热机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

微/纳米结构羟基磷灰石调控水稻根系吸收Pb的微观机理

国家自然科学基金

0+阅读 · 2013年12月31日

土壤微生物-矿物微界面结合Pb的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

纳米粒子在复合物中分散性定量表征及与介电性关系

国家自然科学基金

0+阅读 · 2012年12月31日

转录协同因子SRC-3在细菌性肠炎中对肠道的保护作用及其机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

碳纳米材料-半导体量子点复合体系制备和光电性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员