近距离近邻选择多级高密度检索 (On Approximate Nearest Neighbour Selection for Multi-Stage Dense Retrieval) - 专知论文

会员服务 ·

0

可辨认的 · 近似 · 秩 · MSMARCO · 得分 ·

2021 年 8 月 25 日

On Approximate Nearest Neighbour Selection for Multi-Stage Dense Retrieval

翻译：近距离近邻选择多级高密度检索

Craig Macdonald,Nicola Tonellotto

Dense retrieval, which describes the use of contextualised language models such as BERT to identify documents from a collection by leveraging approximate nearest neighbour (ANN) techniques, has been increasing in popularity. Two families of approaches have emerged, depending on whether documents and queries are represented by single or multiple embeddings. ColBERT, the exemplar of the latter, uses an ANN index and approximate scores to identify a set of candidate documents for each query embedding, which are then re-ranked using accurate document representations. In this manner, a large number of documents can be retrieved for each query, hindering the efficiency of the approach. In this work, we investigate the use of ANN scores for ranking the candidate documents, in order to decrease the number of candidate documents being fully scored. Experiments conducted on the MSMARCO passage ranking corpus demonstrate that, by cutting of the candidate set by using the approximate scores to only 200 documents, we can still obtain an effective ranking without statistically significant differences in effectiveness, and resulting in a 2x speedup in efficiency.

翻译：大量检索说明使用背景化语言模型,例如BERT,利用近邻(ANN)技术从收藏中查找文件,这种检索方式越来越受欢迎,出现了两组方法,这取决于文件和查询是否由单个或多个嵌入来代表。 ColBERT, 后者的范例,使用ANN指数和近似分数来确定每个插入的一套候选文件,然后使用准确的文件表述重新排序,这样,每个查询都可检索大量文件,这妨碍了方法的效率。我们调查使用ANN评分来排列候选文件,以便减少被完全评分的候选文件数量。在MSMARCO的排行榜上进行的实验表明,通过将大约的评分用于200份文件,我们通过将候选人的评分削减,仍然能够取得有效的排名,而不会在统计上出现显著的差异,并导致效率的2x加速。

0

相关内容

可辨认的

NLP新范式-预训练，提示(Prompt)，预测！CMU刘鹏飞等论文综述预训练语言模型提示学习进展

NLP新范式-预训练，提示(Prompt)，预测！CMU刘鹏飞等论文综述预训练语言模型提示学习进展

专知会员服务

71+阅读 · 2021年7月31日

ICML 2021 | 向抗视觉混淆的主动目标跟踪迈进

专知会员服务

12+阅读 · 2021年7月6日

【CVPR2021】基于端到端预训练的视觉-语言表征学习

【CVPR2021】基于端到端预训练的视觉-语言表征学习

专知会员服务

38+阅读 · 2021年4月9日

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

【最受欢迎的概率书】《概率论：理论与实例》，490页pdf

【最受欢迎的概率书】《概率论：理论与实例》，490页pdf

专知会员服务

170+阅读 · 2020年11月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度估计| 2019最新综述】单目深度估计方法综述（Monocular Depth Estimation: A Survey）

专知会员服务

69+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

BERT相关论文、文章和代码资源汇总

BERT相关论文、文章和代码资源汇总

AINLP

19+阅读 · 2018年11月17日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

专知

9+阅读 · 2018年3月9日

【论文推荐】最新七篇目标检测相关论文—Self Paced、上下文注意力、特征反射、层次特征、Tiny SSD、少样本、协同学习

【论文推荐】最新七篇目标检测相关论文—Self Paced、上下文注意力、特征反射、层次特征、Tiny SSD、少样本、协同学习

专知

6+阅读 · 2018年2月25日

【论文推荐】最新6篇行人重识别相关论文—深度空间特征重构、生成对抗网络、图像生成、系列实战、图像-图像域自适应方法、行人检索

【论文推荐】最新6篇行人重识别相关论文—深度空间特征重构、生成对抗网络、图像生成、系列实战、图像-图像域自适应方法、行人检索

专知

5+阅读 · 2018年1月21日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

Approximate Sampling and Counting of Graphs with Near-Regular Degree Intervals

Arxiv

0+阅读 · 2021年10月18日

MurTree: Optimal Classification Trees via Dynamic Programming and Search

Arxiv

0+阅读 · 2021年10月15日

Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance

Arxiv

6+阅读 · 2021年8月2日

Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

Arxiv

4+阅读 · 2021年5月8日

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Arxiv

11+阅读 · 2021年1月28日

Multi-Stage Document Ranking with BERT

Arxiv

5+阅读 · 2019年10月31日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

Combination of Multiple Global Descriptors for Image Retrieval

Combination of Multiple Global Descriptors for Image Retrieval

Arxiv

3+阅读 · 2019年4月18日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

VIP会员

文章信息

相关主题

相关VIP内容

NLP新范式-预训练，提示(Prompt)，预测！CMU刘鹏飞等论文综述预训练语言模型提示学习进展

NLP新范式-预训练，提示(Prompt)，预测！CMU刘鹏飞等论文综述预训练语言模型提示学习进展

专知会员服务

71+阅读 · 2021年7月31日

ICML 2021 | 向抗视觉混淆的主动目标跟踪迈进

专知会员服务

12+阅读 · 2021年7月6日

【CVPR2021】基于端到端预训练的视觉-语言表征学习

【CVPR2021】基于端到端预训练的视觉-语言表征学习

专知会员服务

38+阅读 · 2021年4月9日

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

【最受欢迎的概率书】《概率论：理论与实例》，490页pdf

【最受欢迎的概率书】《概率论：理论与实例》，490页pdf

专知会员服务

170+阅读 · 2020年11月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度估计| 2019最新综述】单目深度估计方法综述（Monocular Depth Estimation: A Survey）

专知会员服务

69+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

BERT相关论文、文章和代码资源汇总

BERT相关论文、文章和代码资源汇总

AINLP

19+阅读 · 2018年11月17日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

专知

9+阅读 · 2018年3月9日

【论文推荐】最新七篇目标检测相关论文—Self Paced、上下文注意力、特征反射、层次特征、Tiny SSD、少样本、协同学习

【论文推荐】最新七篇目标检测相关论文—Self Paced、上下文注意力、特征反射、层次特征、Tiny SSD、少样本、协同学习

专知

6+阅读 · 2018年2月25日

【论文推荐】最新6篇行人重识别相关论文—深度空间特征重构、生成对抗网络、图像生成、系列实战、图像-图像域自适应方法、行人检索

【论文推荐】最新6篇行人重识别相关论文—深度空间特征重构、生成对抗网络、图像生成、系列实战、图像-图像域自适应方法、行人检索

专知

5+阅读 · 2018年1月21日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

相关论文

Approximate Sampling and Counting of Graphs with Near-Regular Degree Intervals

Arxiv

0+阅读 · 2021年10月18日

MurTree: Optimal Classification Trees via Dynamic Programming and Search

Arxiv

0+阅读 · 2021年10月15日

Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance

Arxiv

6+阅读 · 2021年8月2日

Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

Arxiv

4+阅读 · 2021年5月8日

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Arxiv

11+阅读 · 2021年1月28日

Multi-Stage Document Ranking with BERT

Arxiv

5+阅读 · 2019年10月31日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

Combination of Multiple Global Descriptors for Image Retrieval

Combination of Multiple Global Descriptors for Image Retrieval

Arxiv

3+阅读 · 2019年4月18日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

微信扫码咨询专知VIP会员