BER: 信息检索模型零光评价的异种基准 (BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models) - 专知论文

会员服务 ·

0

INFORMS · MoDELS · 稳健性 · 信息检索 · Better ·

2021 年 10 月 21 日

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

翻译：BER: 信息检索模型零光评价的异种基准

Nandan Thakur,Nils Reimers,Andreas Rücklé,Abhishek Srivastava,Iryna Gurevych

from arxiv, Accepted at NeurIPS 2021 Dataset and Benchmark Track

Existing neural information retrieval (IR) models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their out-of-distribution (OOD) generalization capabilities. To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval. We leverage a careful selection of 18 publicly available datasets from diverse text retrieval tasks and domains and evaluate 10 state-of-the-art retrieval systems including lexical, sparse, dense, late-interaction and re-ranking architectures on the BEIR benchmark. Our results show BM25 is a robust baseline and re-ranking and late-interaction-based models on average achieve the best zero-shot performances, however, at high computational costs. In contrast, dense and sparse-retrieval models are computationally more efficient but often underperform other approaches, highlighting the considerable room for improvement in their generalization capabilities. We hope this framework allows us to better evaluate and understand existing retrieval systems, and contributes to accelerating progress towards better robust and generalizable systems in the future. BEIR is publicly available at https://github.com/UKPLab/beir.

翻译：现有神经信息检索(IR)模型往往在单一和狭窄的环境中进行研究,这些模型对于其分配外(OOD)一般化能力的洞察力相当有限。为了解决这个问题,并且为了便利研究人员广泛评价其模型的有效性,我们采用基准-IR(BEIR),这是信息检索的强有力和多样化的评价基准。我们从不同的文本检索任务和领域仔细选择了18个公开可获取的数据集,并评价了10个最先进的检索系统,包括词汇、稀少、密集、晚间互动和BEIR基准的重新定位结构。我们的结果显示,BM25是一个强有力的基线,并且重新排序和晚间互动模型,平均以高计算成本实现最佳零点性能。相比之下,密集和稀少的检索模型在计算上效率更高,但往往低于其他方法,突出了改进一般化能力的巨大空间。我们希望这一框架使我们能够更好地评估和理解现有的检索系统,并有助于加快未来更稳健和普遍化系统的进展。BEBI/RBIR在 http://BAGIGR/COM上公开提供。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

专知会员服务

93+阅读 · 2020年7月10日

面向大数据存储的大型元数据服务器的研究，A Survey on Large Scale Metadata Server for Big Data Storage

面向大数据存储的大型元数据服务器的研究，A Survey on Large Scale Metadata Server for Big Data Storage

专知会员服务

9+阅读 · 2020年5月15日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【北航】深度学习编译器综述|The Deep Learning Compiler: A Comprehensive Survey

【北航】深度学习编译器综述|The Deep Learning Compiler: A Comprehensive Survey

专知会员服务

38+阅读 · 2020年2月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

已删除

将门创投

8+阅读 · 2019年1月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Audio Retrieval with Natural Language Queries: A Benchmark Study

Arxiv

0+阅读 · 2021年12月17日

CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

Arxiv

0+阅读 · 2021年12月16日

Value Retrieval with Arbitrary Queries for Form-like Documents

Arxiv

0+阅读 · 2021年12月15日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

Arxiv

4+阅读 · 2021年10月14日

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Arxiv

3+阅读 · 2020年3月24日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

Arxiv

7+阅读 · 2018年3月28日

Zero-Shot Detection

Arxiv

7+阅读 · 2018年3月19日

VIP会员

文章信息

相关主题

相关VIP内容

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

专知会员服务

93+阅读 · 2020年7月10日

面向大数据存储的大型元数据服务器的研究，A Survey on Large Scale Metadata Server for Big Data Storage

面向大数据存储的大型元数据服务器的研究，A Survey on Large Scale Metadata Server for Big Data Storage

专知会员服务

9+阅读 · 2020年5月15日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【北航】深度学习编译器综述|The Deep Learning Compiler: A Comprehensive Survey

【北航】深度学习编译器综述|The Deep Learning Compiler: A Comprehensive Survey

专知会员服务

38+阅读 · 2020年2月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

已删除

将门创投

8+阅读 · 2019年1月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Audio Retrieval with Natural Language Queries: A Benchmark Study

Arxiv

0+阅读 · 2021年12月17日

CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

Arxiv

0+阅读 · 2021年12月16日

Value Retrieval with Arbitrary Queries for Form-like Documents

Arxiv

0+阅读 · 2021年12月15日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

Arxiv

4+阅读 · 2021年10月14日

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Arxiv

3+阅读 · 2020年3月24日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

Arxiv

7+阅读 · 2018年3月28日

Zero-Shot Detection

Arxiv

7+阅读 · 2018年3月19日

微信扫码咨询专知VIP会员