GPL: 用于未受监管域域调适常量检索的生成化多标签 (GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval) - 专知论文

会员服务 ·

0

GPL · 无监督 · 标注 · domain shift · 目标领域 ·

2022 年 4 月 25 日

GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

翻译：GPL: 用于未受监管域域调适常量检索的生成化多标签

Kexin Wang,Nandan Thakur,Nils Reimers,Iryna Gurevych

from arxiv, Accepted at NAACL 2022

Dense retrieval approaches can overcome the lexical gap and lead to significantly improved search results. However, they require large amounts of training data which is not available for most domains. As shown in previous work (Thakur et al., 2021b), the performance of dense retrievers severely degrades under a domain shift. This limits the usage of dense retrieval approaches to only a few domains with large training datasets. In this paper, we propose the novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL), which combines a query generator with pseudo labeling from a cross-encoder. On six representative domain-specialized datasets, we find the proposed GPL can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 9.3 points nDCG@10. GPL requires less (unlabeled) data from the target domain and is more robust in its training than previous methods. We further investigate the role of six recent pre-training methods in the scenario of domain adaptation for retrieval tasks, where only three could yield improved results. The best approach, TSDAE (Wang et al., 2021) can be combined with GPL, yielding another average improvement of 1.4 points nDCG@10 across the six tasks. The code and the models are available at https://github.com/UKPLab/gpl.

翻译：高密度检索方法可以克服词汇上的差距,并导致显著改进搜索结果。但是,它们需要大量的培训数据,而大多数领域都不具备这些数据。正如先前的工作(Thakur等人,2021b)所示,密集检索器的性能在一个域变换下严重退化。这限制了密集检索方法的使用,仅局限于几个有大型培训数据集的少数领域。在本文件中,我们提议采用新的、不受监督的域域适应方法生成的生成器生成器 Peseudo Labeling(GPL),该方法将查询生成器与跨编码的假标签结合起来。在六个有代表性的域专门数据集中,我们发现拟议的GPL能够超越一个箱外的状态检索方法,达到9.3 点 nDCG@10。 GPL要求目标领域较少(无标签的)数据,而且其培训比以往方法更加有力。我们进一步调查了最近六种培训前方法在检索任务域适应方案中的作用,只有三种能够产生改进的结果。在6个域适应模型中,TPLB10 和另外1个可改进的模型。

0

相关内容

GPL

General Public License的缩写--GNU通用公共授权。大多数软件许可证剥夺共享和修改软件的自由。GNU通用公共许可证试图保证你共享和修改自由软件的自由。保证自由软件对所有用户是自由的。

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

96+阅读 · 2020年3月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

专知会员服务

62+阅读 · 2020年1月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

用于稠密检索的无监督领域适应方法—Generative Pseudo Labeling (GPL)

用于稠密检索的无监督领域适应方法—Generative Pseudo Labeling (GPL)

PaperWeekly

0+阅读 · 2021年12月25日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

HO-1通过p38信号通路调控高温诱导的牛卵巢颗粒细胞凋亡机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于UGC的应急响应决策支持系统关键技术研究

国家自然科学基金

12+阅读 · 2014年12月31日

新型生长抑制因子BDGI诱导的细胞自噬在乳腺癌发生中的作用及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

以离子液体为溶剂的纤维素/蛋白质共溶解与纺丝成形机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

花生Cu/Zn-SOD活性响应干旱胁迫的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型Wnt通路抑制剂SERPINA3K对角膜作用的系列研究

国家自然科学基金

0+阅读 · 2011年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

远红外新型Te基硫系玻璃研制及相关性质研究

国家自然科学基金

0+阅读 · 2009年12月31日

干涉SAR与LIDAR森林参数协同反演模型与方法

国家自然科学基金

0+阅读 · 2008年12月31日

Domain Transformer: Predicting Samples of Unseen, Future Domains

Domain Transformer: Predicting Samples of Unseen, Future Domains

Arxiv

0+阅读 · 2022年6月10日

ConFUDA: Contrastive Fewshot Unsupervised Domain Adaptation for Medical Image Segmentation

Arxiv

0+阅读 · 2022年6月8日

Self-supervised Domain Adaptation in Crowd Counting

Arxiv

0+阅读 · 2022年6月7日

Unsupervised Context Aware Sentence Representation Pretraining for Multi-lingual Dense Retrieval

Arxiv

0+阅读 · 2022年6月7日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Arxiv

14+阅读 · 2021年4月27日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

96+阅读 · 2020年3月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

专知会员服务

62+阅读 · 2020年1月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

扩散模型中的 Transformer：图像生成及其延展应用询问 ChatGPT

281页pdf《神经网络设计入门》

【普林斯顿博士论文】以奖励推动生成式人工智能的发展：奖励引导生成的理论与方法

中文版 | 火力支援与巡飞弹药的未来（附原文）

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

用于稠密检索的无监督领域适应方法—Generative Pseudo Labeling (GPL)

用于稠密检索的无监督领域适应方法—Generative Pseudo Labeling (GPL)

PaperWeekly

0+阅读 · 2021年12月25日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Domain Transformer: Predicting Samples of Unseen, Future Domains

Domain Transformer: Predicting Samples of Unseen, Future Domains

Arxiv

0+阅读 · 2022年6月10日

ConFUDA: Contrastive Fewshot Unsupervised Domain Adaptation for Medical Image Segmentation

Arxiv

0+阅读 · 2022年6月8日

Self-supervised Domain Adaptation in Crowd Counting

Arxiv

0+阅读 · 2022年6月7日

Unsupervised Context Aware Sentence Representation Pretraining for Multi-lingual Dense Retrieval

Arxiv

0+阅读 · 2022年6月7日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Arxiv

14+阅读 · 2021年4月27日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

相关基金

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

HO-1通过p38信号通路调控高温诱导的牛卵巢颗粒细胞凋亡机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于UGC的应急响应决策支持系统关键技术研究

国家自然科学基金

12+阅读 · 2014年12月31日

新型生长抑制因子BDGI诱导的细胞自噬在乳腺癌发生中的作用及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

以离子液体为溶剂的纤维素/蛋白质共溶解与纺丝成形机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

花生Cu/Zn-SOD活性响应干旱胁迫的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型Wnt通路抑制剂SERPINA3K对角膜作用的系列研究

国家自然科学基金

0+阅读 · 2011年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

远红外新型Te基硫系玻璃研制及相关性质研究

国家自然科学基金

0+阅读 · 2009年12月31日

干涉SAR与LIDAR森林参数协同反演模型与方法

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员