SiMANS: 常量文本检索简单模糊的负负值抽样 (SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval) - 专知论文

会员服务 ·

0

假阴性 · 负例 · SimPLe · 样本 · 采样法 ·

2022 年 10 月 24 日

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval

翻译：SiMANS: 常量文本检索简单模糊的负负值抽样

Kun Zhou,Yeyun Gong,Xiao Liu,Wayne Xin Zhao,Yelong Shen,Anlei Dong,Jingwen Lu,Rangan Majumder,Ji-Rong Wen,Nan Duan,Weizhu Chen

from arxiv, 12 pages, accepted by EMNLP 2022

Sampling proper negatives from a large document pool is vital to effectively train a dense retrieval model. However, existing negative sampling strategies suffer from the uninformative or false negative problem. In this work, we empirically show that according to the measured relevance scores, the negatives ranked around the positives are generally more informative and less likely to be false negatives. Intuitively, these negatives are not too hard (\emph{may be false negatives}) or too easy (\emph{uninformative}). They are the ambiguous negatives and need more attention during training. Thus, we propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives. Extensive experiments on four public and one industry datasets show the effectiveness of our approach. We made the code and models publicly available in \url{https://github.com/microsoft/SimXNS}.

翻译：从大型文件库中抽取适当的底片对于有效培训密集检索模式至关重要。但是,现有的负面抽样战略存在不知情或虚假的负面问题。在这项工作中,我们从经验上表明,根据测量的相关性分数,在正数周围排位的底片一般信息量较大,不太可能是虚假的底片。从直觉上看,这些底片并不难(可能是虚假的底片)或过于容易(emph{uninformation}),它们是模糊的底片,在培训期间需要更多注意。因此,我们提出了简单的模糊的底片抽样方法,即SimANS,它包含新的抽样概率分布,以抽样比较模糊的底片。对四个公共数据和一个行业数据集进行的广泛实验显示了我们的方法的有效性。我们在\url{https://github.com/microsoft/SimXNS}中公布了代码和模型。

0

相关内容

假阴性

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

PINK1在阿霉素诱导心肌细胞凋亡中的作用及芍药苷保护机制的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK-Beclin-1/Vps34通路在维生素D3（Vit D)诱导足细胞自噬中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

IMD 对脓毒症休克大鼠心肌收缩功能的保护作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

利用微型可视毛细管反应器研究亚临界水中氯甲苯/四氯乙烯等疏水性有机物溶解度

国家自然科学基金

0+阅读 · 2012年12月31日

NF-κB和Nrf2-ARE信号通路调控CdTe量子点氧化损伤作用的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

MRI动态监测小肠缺血再灌注损伤肠上皮细胞内Ca2+变化的实验研究

国家自然科学基金

0+阅读 · 2010年12月31日

高分子薄膜体系的相分离和去润湿耦合行为

国家自然科学基金

0+阅读 · 2009年12月31日

共掺杂和晶格缺陷调控ZnO基稀磁半导体磁性的同步辐射研究

国家自然科学基金

0+阅读 · 2009年12月31日

Folbp1在胚胎期多氯联苯暴露致子代心脏发育缺陷中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

General multi-fidelity surrogate models: Framework and active learning strategies for efficient rare event simulation

Arxiv

0+阅读 · 2022年12月7日

Semantically Enhanced Global Reasoning for Semantic Segmentation

Arxiv

0+阅读 · 2022年12月6日

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Arxiv

0+阅读 · 2022年12月5日

CBNet: A Plug-and-Play Network for Segmentation-based Scene Text Detection

Arxiv

0+阅读 · 2022年12月5日

Hierarchical Contrast for Unsupervised Skeleton-based Action Representation Learning

Arxiv

0+阅读 · 2022年12月5日

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

Arxiv

0+阅读 · 2022年12月2日

MHCCL: Masked Hierarchical Cluster-wise Contrastive Learning for Multivariate Time Series

Arxiv

6+阅读 · 2022年12月2日

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

Arxiv

0+阅读 · 2022年12月2日

Focus! Relevant and Sufficient Context Selection for News Image Captioning

Arxiv

0+阅读 · 2022年12月1日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

VIP会员

文章信息

相关主题

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

General multi-fidelity surrogate models: Framework and active learning strategies for efficient rare event simulation

Arxiv

0+阅读 · 2022年12月7日

Semantically Enhanced Global Reasoning for Semantic Segmentation

Arxiv

0+阅读 · 2022年12月6日

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

Arxiv

0+阅读 · 2022年12月5日

CBNet: A Plug-and-Play Network for Segmentation-based Scene Text Detection

Arxiv

0+阅读 · 2022年12月5日

Hierarchical Contrast for Unsupervised Skeleton-based Action Representation Learning

Arxiv

0+阅读 · 2022年12月5日

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

Arxiv

0+阅读 · 2022年12月2日

MHCCL: Masked Hierarchical Cluster-wise Contrastive Learning for Multivariate Time Series

Arxiv

6+阅读 · 2022年12月2日

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

Arxiv

0+阅读 · 2022年12月2日

Focus! Relevant and Sufficient Context Selection for News Image Captioning

Arxiv

0+阅读 · 2022年12月1日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

相关基金

PINK1在阿霉素诱导心肌细胞凋亡中的作用及芍药苷保护机制的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK-Beclin-1/Vps34通路在维生素D3（Vit D)诱导足细胞自噬中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

IMD 对脓毒症休克大鼠心肌收缩功能的保护作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

利用微型可视毛细管反应器研究亚临界水中氯甲苯/四氯乙烯等疏水性有机物溶解度

国家自然科学基金

0+阅读 · 2012年12月31日

NF-κB和Nrf2-ARE信号通路调控CdTe量子点氧化损伤作用的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

MRI动态监测小肠缺血再灌注损伤肠上皮细胞内Ca2+变化的实验研究

国家自然科学基金

0+阅读 · 2010年12月31日

高分子薄膜体系的相分离和去润湿耦合行为

国家自然科学基金

0+阅读 · 2009年12月31日

共掺杂和晶格缺陷调控ZnO基稀磁半导体磁性的同步辐射研究

国家自然科学基金

0+阅读 · 2009年12月31日

Folbp1在胚胎期多氯联苯暴露致子代心脏发育缺陷中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员