未经监督的 " 密集检索 " 保留更好的正对称:与 " 查询 " 的采掘和代代相配的可缩放增量 (Unsupervised Dense Retrieval Deserves Better Positive Pairs: Scalable Augmentation with Query Extraction and Generation) - 专知论文

会员服务 ·

0

无监督 · state-of-the-art · Performer · Better · Extensibility ·

2022 年 12 月 17 日

Unsupervised Dense Retrieval Deserves Better Positive Pairs: Scalable Augmentation with Query Extraction and Generation

翻译：未经监督的 " 密集检索 " 保留更好的正对称:与 " 查询 " 的采掘和代代相配的可缩放增量

Rui Meng,Ye Liu,Semih Yavuz,Divyansh Agarwal,Lifu Tu,Ning Yu,Jianguo Zhang,Meghana Bhat,Yingbo Zhou

Dense retrievers have made significant strides in obtaining state-of-the-art results on text retrieval and open-domain question answering (ODQA). Yet most of these achievements were made possible with the help of large annotated datasets, unsupervised learning for dense retrieval models remains an open problem. In this work, we explore two categories of methods for creating pseudo query-document pairs, named query extraction (QExt) and transferred query generation (TQGen), to augment the retriever training in an annotation-free and scalable manner. Specifically, QExt extracts pseudo queries by document structures or selecting salient random spans, and TQGen utilizes generation models trained for other NLP tasks (e.g., summarization) to produce pseudo queries. Extensive experiments show that dense retrievers trained with individual augmentation methods can perform comparably well with multiple strong baselines, and combining them leads to further improvements, achieving state-of-the-art performance of unsupervised dense retrieval on both BEIR and ODQA datasets.

翻译：大量检索者在获取文本检索和开放域答题(ODQA)的最新结果方面取得了长足的进步。然而,大多数这些成就都是在大量附加注释的数据集的帮助下取得的,对密集检索模型的未经监督的学习仍然是一个尚未解决的问题。在这项工作中,我们探索了两类方法来创建假的查询文件配对,名为查询提取(QExt)和传输查询生成(TQGen),以无注释和可缩放的方式加强检索者培训。具体来说,QExt通过文件结构或选择突出的随机光谱提取假查询,TQGen利用为其他NLP任务(如合成)培训的生成模型来生成假查询。广泛的实验表明,经过个人增强方法培训的密度检索者可以与多个强大的基线相匹配,并将它们结合起来可以进一步改进,在BEI和ODQA数据集上实现未加控制的密集检索的最新性能。

0

相关内容

无监督

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

专知

13+阅读 · 2018年2月18日

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

分数排斥统计下低维相互作用量子气体的输运性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Wnt/β-catenin与EGFR信号通路交互作用在NSCLC吉非替尼获得性耐药中的作用和机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

基于iRGD靶向载药脂质体-微泡复合体的超声成像引导给药治疗肿瘤的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

几个非线性Schrodinger方程组模型及相关问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

Beta沸石多形体的合成与催化性质

国家自然科学基金

0+阅读 · 2011年12月31日

基于溶质-温度-相三场耦合的铝铜合金凝固偏析三维数值模拟

国家自然科学基金

0+阅读 · 2009年12月31日

基于阈值光电子－光离子符合成像的团簇光谱和解离研究

国家自然科学基金

0+阅读 · 2009年12月31日

DREEAM: Guiding Attention with Evidence for Improving Document-Level Relation Extraction

Arxiv

0+阅读 · 2023年2月17日

Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data

Arxiv

0+阅读 · 2023年2月16日

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Arxiv

0+阅读 · 2023年2月15日

Contrastive Triple Extraction with Generative Transformer

Arxiv

13+阅读 · 2021年2月4日

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning

Arxiv

27+阅读 · 2021年1月21日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《太空边缘（临近空间）的武器化？军事高空平台的进展与前景》

《利用星基增强系统（SBAS）信号进行射频干扰（RFI）检测与特征分析》

美陆军在“艾布拉姆斯”坦克与“布拉德利”步战车上测试“牛蛙”反无人机炮塔

《军事领域特性及其对军事人工智能应用的影响》

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

专知

13+阅读 · 2018年2月18日

相关论文

DREEAM: Guiding Attention with Evidence for Improving Document-Level Relation Extraction

Arxiv

0+阅读 · 2023年2月17日

Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data

Arxiv

0+阅读 · 2023年2月16日

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Arxiv

0+阅读 · 2023年2月15日

Contrastive Triple Extraction with Generative Transformer

Arxiv

13+阅读 · 2021年2月4日

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning

Arxiv

27+阅读 · 2021年1月21日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

相关基金

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

分数排斥统计下低维相互作用量子气体的输运性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Wnt/β-catenin与EGFR信号通路交互作用在NSCLC吉非替尼获得性耐药中的作用和机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

基于iRGD靶向载药脂质体-微泡复合体的超声成像引导给药治疗肿瘤的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

几个非线性Schrodinger方程组模型及相关问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

Beta沸石多形体的合成与催化性质

国家自然科学基金

0+阅读 · 2011年12月31日

基于溶质-温度-相三场耦合的铝铜合金凝固偏析三维数值模拟

国家自然科学基金

0+阅读 · 2009年12月31日

基于阈值光电子－光离子符合成像的团簇光谱和解离研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员