BER: 信息检索模型零光评价的异种基准 (BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models) - 专知论文

会员服务 ·

0

INFORMS · MoDELS · 稳健性 · 信息检索 · Better ·

2021 年 9 月 7 日

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

翻译：BER: 信息检索模型零光评价的异种基准

Nandan Thakur,Nils Reimers,Andreas Rücklé,Abhishek Srivastava,Iryna Gurevych

from arxiv, 9 pages

Existing neural information retrieval (IR) models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their out-of-distribution (OOD) generalization capabilities. To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval. We leverage a careful selection of 18 publicly available datasets from diverse text retrieval tasks and domains and evaluate 10 state-of-the-art retrieval systems including lexical, sparse, dense, late-interaction and re-ranking architectures on the BEIR benchmark. Our results show BM25 is a robust baseline and re-ranking and late-interaction-based models on average achieve the best zero-shot performances, however, at high computational costs. In contrast, dense and sparse-retrieval models are computationally more efficient but often underperform other approaches, highlighting the considerable room for improvement in their generalization capabilities. We hope this framework allows us to better evaluate and understand existing retrieval systems, and contributes to accelerating progress towards better robust and generalizable systems in the future. BEIR is publicly available at https://github.com/UKPLab/beir.

翻译：现有神经信息检索(IR)模型往往在单一和狭窄的环境中进行研究,这些模型对于其分配外(OOD)一般化能力的洞察力相当有限。为了解决这个问题,并且为了便利研究人员广泛评价其模型的有效性,我们采用基准-IR(BEIR),这是信息检索的强有力和多样化的评价基准。我们从不同的文本检索任务和领域仔细选择了18个公开可获取的数据集,并评价了10个最先进的检索系统,包括词汇、稀少、密集、晚间互动和BEIR基准的重新定位结构。我们的结果显示,BM25是一个强有力的基线,并且重新排序和晚间互动模型,平均以高计算成本实现最佳零点性能。相比之下,密集和稀少的检索模型在计算上效率更高,但往往低于其他方法,突出了改进一般化能力的巨大空间。我们希望这一框架使我们能够更好地评估和理解现有的检索系统,并有助于加快未来更稳健和普遍化系统的进展。BEBI/RBIR在 http://BAGIGR/COM上公开提供。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

专知会员服务

30+阅读 · 2020年10月9日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【哈工大】基于文档的对话系统(DGDS)综述，A Survey of Document Grounded Dialogue Systems (DGDS)

【哈工大】基于文档的对话系统(DGDS)综述，A Survey of Document Grounded Dialogue Systems (DGDS)

专知会员服务

35+阅读 · 2020年4月30日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

KDD2020推荐系统论文聚焦

KDD2020推荐系统论文聚焦

机器学习与推荐算法

15+阅读 · 2020年6月28日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

Arxiv

4+阅读 · 2021年10月14日

Semantic Models for the First-stage Retrieval: A Comprehensive Review

Arxiv

20+阅读 · 2021年9月17日

OntoZSL: Ontology-enhanced Zero-shot Learning

Arxiv

17+阅读 · 2021年2月15日

Deep Image Retrieval: A Survey

Arxiv

16+阅读 · 2021年1月27日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Arxiv

3+阅读 · 2020年3月24日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Affective Computing for Large-Scale Heterogeneous Multimedia Data: A Survey

Arxiv

10+阅读 · 2019年10月3日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Arxiv

5+阅读 · 2018年2月14日

VIP会员

文章信息

相关主题

相关VIP内容

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

专知会员服务

30+阅读 · 2020年10月9日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【哈工大】基于文档的对话系统(DGDS)综述，A Survey of Document Grounded Dialogue Systems (DGDS)

【哈工大】基于文档的对话系统(DGDS)综述，A Survey of Document Grounded Dialogue Systems (DGDS)

专知会员服务

35+阅读 · 2020年4月30日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

KDD2020推荐系统论文聚焦

KDD2020推荐系统论文聚焦

机器学习与推荐算法

15+阅读 · 2020年6月28日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

Arxiv

4+阅读 · 2021年10月14日

Semantic Models for the First-stage Retrieval: A Comprehensive Review

Arxiv

20+阅读 · 2021年9月17日

OntoZSL: Ontology-enhanced Zero-shot Learning

Arxiv

17+阅读 · 2021年2月15日

Deep Image Retrieval: A Survey

Arxiv

16+阅读 · 2021年1月27日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Arxiv

3+阅读 · 2020年3月24日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Affective Computing for Large-Scale Heterogeneous Multimedia Data: A Survey

Arxiv

10+阅读 · 2019年10月3日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Arxiv

5+阅读 · 2018年2月14日

微信扫码咨询专知VIP会员