试探: 强力通过权检索综合文本代表的简单方法 (Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval) - 专知论文

会员服务 ·

0

稳健性 · SimPLe · 向量化 · INFORMS · 变换 ·

2022 年 7 月 31 日

Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

翻译：试探: 强力通过权检索综合文本代表的简单方法

Sheng-Chieh Lin,Minghan Li,Jimmy Lin

from arxiv, 12 pages

Pre-trained transformers has declared its success in many NLP tasks. One thread of work focuses on training bi-encoder models (i.e., dense retrievers) to effectively encode sentences or passages into single-vector dense vectors for efficient approximate nearest neighbor (ANN) search. However, recent work has demonstrated that transformers pre-trained with mask language modeling (MLM) are not capable of effectively aggregating text information into a single dense vector due to task-mismatch between pre-training and fine-tuning. Therefore, computationally expensive techniques have been adopted to train dense retrievers, such as large batch size, knowledge distillation or post pre-training. In this work, we present a simple approach to effectively aggregate textual representation from the pre-trained transformer into a dense vector. Extensive experiments show that our approach improves the robustness of the single-vector approach under both in-domain and zero-shot evaluations without any computationally expensive training techniques. Our work demonstrates that MLM pre-trained transformers can be used to effectively encode text information into a single-vector for dense retrieval. Code are available at: https://github.com/castorini/dhr

翻译：培训前的变压器已经在许多NLP任务中宣布成功。一项工作重点是培训双电解码模型(即密集的检索器),将句子或通道有效编码成单矢量,以高效近邻搜索(ANN),然而,最近的工作表明,经过蒙面语言模型(MLM)预先训练的变压器无法有效地将文本信息整合成单一密度矢量,因为培训前和微调之间的任务拼凑。因此,已经采用了计算成本昂贵的技术来培训密集的检索器,如大批量、知识蒸馏或培训前后。在这项工作中,我们提出了一个简单的方法,将预培训变压器的文本代表有效综合到密度矢量矢量矢量。广泛的实验表明,我们的方法在不使用任何计算成本昂贵的培训技术的情况下改进了在内和零光评价下的单一矢量器方法的稳健性。我们的工作表明,MLM预先训练过的变压器可以有效地将文本信息编成一个单位数源数,用于密集检索。可在 http:// https/ comcast:

0

相关内容

稳健性

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

EB病毒BARTs miRNA在鼻咽癌发生发展中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

REGγ在多发性骨髓瘤中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

超声介导的"治疗诊断"多功能超分子基因微载体的研究

国家自然科学基金

0+阅读 · 2012年12月31日

CRYAA基因DNA甲基化在年龄相关性白内障发病中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

急性胰腺炎凋亡相关miRNA及其靶基因的筛选、鉴定和调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

含氰重金属废水中铁氰配离子插层的多元重金属LDH沉淀的调控形成与污染物的同步净化

国家自然科学基金

0+阅读 · 2011年12月31日

二氢嘧啶脱氢酶靶向基因抑制协同5-氟尿嘧啶在膀胱癌个体化治疗中的实验研究

国家自然科学基金

0+阅读 · 2010年12月31日

HAT/HDAC失衡与乙酰化修饰异常：急性肺损伤炎症失控新机制

国家自然科学基金

0+阅读 · 2009年12月31日

肝移植后缺血性胆道病变超声造影早期诊断的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

miRNA-1和miRNA-133在缺血后处理的心肌保护机制中的调控作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

Audio Retrieval with WavText5K and CLAP Training

Audio Retrieval with WavText5K and CLAP Training

Arxiv

0+阅读 · 2022年9月28日

Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Arxiv

0+阅读 · 2022年9月28日

Learning Deep Representations via Contrastive Learning for Instance Retrieval

Arxiv

0+阅读 · 2022年9月28日

PROD: Progressive Distillation for Dense Retrieval

Arxiv

0+阅读 · 2022年9月27日

On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering

Arxiv

0+阅读 · 2022年9月26日

Promptagator: Few-shot Dense Retrieval From 8 Examples

Arxiv

0+阅读 · 2022年9月23日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Audio Retrieval with WavText5K and CLAP Training

Audio Retrieval with WavText5K and CLAP Training

Arxiv

0+阅读 · 2022年9月28日

Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Arxiv

0+阅读 · 2022年9月28日

Learning Deep Representations via Contrastive Learning for Instance Retrieval

Arxiv

0+阅读 · 2022年9月28日

PROD: Progressive Distillation for Dense Retrieval

Arxiv

0+阅读 · 2022年9月27日

On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering

Arxiv

0+阅读 · 2022年9月26日

Promptagator: Few-shot Dense Retrieval From 8 Examples

Arxiv

0+阅读 · 2022年9月23日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

相关基金

EB病毒BARTs miRNA在鼻咽癌发生发展中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

REGγ在多发性骨髓瘤中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

超声介导的"治疗诊断"多功能超分子基因微载体的研究

国家自然科学基金

0+阅读 · 2012年12月31日

CRYAA基因DNA甲基化在年龄相关性白内障发病中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

急性胰腺炎凋亡相关miRNA及其靶基因的筛选、鉴定和调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

含氰重金属废水中铁氰配离子插层的多元重金属LDH沉淀的调控形成与污染物的同步净化

国家自然科学基金

0+阅读 · 2011年12月31日

二氢嘧啶脱氢酶靶向基因抑制协同5-氟尿嘧啶在膀胱癌个体化治疗中的实验研究

国家自然科学基金

0+阅读 · 2010年12月31日

HAT/HDAC失衡与乙酰化修饰异常：急性肺损伤炎症失控新机制

国家自然科学基金

0+阅读 · 2009年12月31日

肝移植后缺血性胆道病变超声造影早期诊断的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

miRNA-1和miRNA-133在缺血后处理的心肌保护机制中的调控作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员