学习获取无监督的通行证 (Learning to Retrieve Passages without Supervision) - 专知论文

会员服务 ·

0

BM25 · 网络爬虫 · Performer · MoDELS · 学成 ·

2021 年 12 月 14 日

Learning to Retrieve Passages without Supervision

翻译：学习获取无监督的通行证

Ori Ram,Gal Shachaf,Omer Levy,Jonathan Berant,Amir Globerson

Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive performance by training on large datasets of question-passage pairs. We investigate whether dense retrievers can be learned in a self-supervised fashion, and applied effectively without any annotations. We observe that existing pretrained models for retrieval struggle in this scenario, and propose a new pretraining scheme designed for retrieval: recurring span retrieval. We use recurring spans across passages in a document to create pseudo examples for contrastive learning. The resulting model -- Spider -- performs surprisingly well without any examples on a wide range of ODQA datasets, and is competitive with BM25, a strong sparse baseline. In addition, Spider often outperforms strong baselines like DPR trained on Natural Questions, when evaluated on questions from other datasets. Our hybrid retriever, which combines Spider with BM25, improves over its components across all datasets, and is often competitive with in-domain DPR models, which are trained on tens of thousands of examples.

翻译：开放域问题解答( ODQA) 的常量检索器显示,通过对大量问题访问对等数据集的培训,可以取得令人印象深刻的成绩。我们调查是否能够以自我监督的方式学习密集检索器,并在没有任何说明的情况下有效地应用。我们观察到,在这种情景中,现有的为检索斗争而预先培训的模式,并提出了一个新的检索前培训计划:重复的跨度检索。我们在一份文件中使用反复的跨段来创建假的对比性学习实例。所产生的模型 -- -- 蜘蛛 -- -- 表现得令人惊讶,没有关于ODQA广泛数据集的任何实例,而且与非常稀少的基线BM25具有竞争力。此外,在对其他数据集的问题进行评估时,蜘蛛往往比DPR所训练的关于自然问题的强基线要强。我们的混合检索器将蜘蛛和BM25组合在一起,改善了所有数据集的组件,并且常常与内部DPR模型竞争,这些模型是成千上万个实例的培训。

0

相关内容

BM25

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【经典书】机器学习黑客秘笈(Machine Learning for Hackers)，322页pdf

专知会员服务

46+阅读 · 2021年2月8日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

已删除

将门创投

3+阅读 · 2019年11月25日

牛逼！深度学习又添新框架，来自Facebook 【Pythia】

牛逼！深度学习又添新框架，来自Facebook 【Pythia】

机器学习算法与Python学习

7+阅读 · 2019年6月25日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Deep Semi-Supervised Learning for Time Series Classification

Arxiv

0+阅读 · 2022年2月16日

Improving Biomedical Information Retrieval with Neural Retrievers

Arxiv

6+阅读 · 2022年1月19日

Image-to-Image Retrieval by Learning Similarity between Scene Graphs

Arxiv

21+阅读 · 2020年12月29日

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

Arxiv

4+阅读 · 2020年10月26日

Contrastive Learning with Adversarial Examples

Arxiv

5+阅读 · 2020年10月22日

Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph

Arxiv

6+阅读 · 2018年10月4日

Learning to Focus when Ranking Answers

Learning to Focus when Ranking Answers

Arxiv

5+阅读 · 2018年8月8日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

QA4IE: A Question Answering based Framework for Information Extraction

Arxiv

4+阅读 · 2018年4月10日

Simple and Effective Semi-Supervised Question Answering

Arxiv

5+阅读 · 2018年4月2日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【经典书】机器学习黑客秘笈(Machine Learning for Hackers)，322页pdf

专知会员服务

46+阅读 · 2021年2月8日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

已删除

将门创投

3+阅读 · 2019年11月25日

牛逼！深度学习又添新框架，来自Facebook 【Pythia】

牛逼！深度学习又添新框架，来自Facebook 【Pythia】

机器学习算法与Python学习

7+阅读 · 2019年6月25日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Deep Semi-Supervised Learning for Time Series Classification

Arxiv

0+阅读 · 2022年2月16日

Improving Biomedical Information Retrieval with Neural Retrievers

Arxiv

6+阅读 · 2022年1月19日

Image-to-Image Retrieval by Learning Similarity between Scene Graphs

Arxiv

21+阅读 · 2020年12月29日

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

Arxiv

4+阅读 · 2020年10月26日

Contrastive Learning with Adversarial Examples

Arxiv

5+阅读 · 2020年10月22日

Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph

Arxiv

6+阅读 · 2018年10月4日

Learning to Focus when Ranking Answers

Learning to Focus when Ranking Answers

Arxiv

5+阅读 · 2018年8月8日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

QA4IE: A Question Answering based Framework for Information Extraction

Arxiv

4+阅读 · 2018年4月10日

Simple and Effective Semi-Supervised Question Answering

Arxiv

5+阅读 · 2018年4月2日

微信扫码咨询专知VIP会员