评价神经检索模型的内分解和外推法性能 (Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models) - 专知论文

会员服务 ·

0

Performer · MoDELS · 分离的 · 模型性能 · 测试数据 ·

2022 年 8 月 4 日

Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models

翻译：评价神经检索模型的内分解和外推法性能

Jingtao Zhan,Xiaohui Xie,Jiaxin Mao,Yiqun Liu,Jiafeng Guo,Min Zhang,Shaoping Ma

from arxiv, CIKM 2022 Full Paper

A retrieval model should not only interpolate the training data but also extrapolate well to the queries that are different from the training data. While neural retrieval models have demonstrated impressive performance on ad-hoc search benchmarks, we still know little about how they perform in terms of interpolation and extrapolation. In this paper, we demonstrate the importance of separately evaluating the two capabilities of neural retrieval models. Firstly, we examine existing ad-hoc search benchmarks from the two perspectives. We investigate the distribution of training and test data and find a considerable overlap in query entities, query intent, and relevance labels. This finding implies that the evaluation on these test sets is biased toward interpolation and cannot accurately reflect the extrapolation capacity. Secondly, we propose a novel evaluation protocol to separately evaluate the interpolation and extrapolation performance on existing benchmark datasets. It resamples the training and test data based on query similarity and utilizes the resampled dataset for training and evaluation. Finally, we leverage the proposed evaluation protocol to comprehensively revisit a number of widely-adopted neural retrieval models. Results show models perform differently when moving from interpolation to extrapolation. For example, representation-based retrieval models perform almost as well as interaction-based retrieval models in terms of interpolation but not extrapolation. Therefore, it is necessary to separately evaluate both interpolation and extrapolation performance and the proposed resampling method serves as a simple yet effective evaluation tool for future IR studies.

翻译：虽然神经检索模型在临时搜索基准方面表现出令人印象深刻的绩效,但我们仍对它们在内推和外推方面如何表现知之甚少。在本文件中,我们展示了分别评估神经检索模型两种能力的重要性。首先,我们从两个角度审查现有的临时随机搜索基准。我们调查培训和测试数据的分布情况,发现查询实体、查询意向和相关性标签存在相当大的重叠。这一发现意味着这些测试组的评价偏向于内推,无法准确地反映外推能力。第二,我们提议新的评价程序,分别评估现有基准数据集的内推和外推性表现。我们从相似性的角度重新审视培训和测试数据,并利用重新抽样数据集进行培训和评价。最后,我们利用拟议的评价协议全面重新审视一些广泛采用的神经检索模型。结果显示,从简单的对内推到几乎外推的对等性分析模式,在从简单的对等式对等性分析到几乎是必要的对等性对等性分析。举例来说,评估模式在从简化的对等的对等的对等性分析中进行不同表现。

0

相关内容

Performer

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TRAF3IP3调控T细胞活性与肿瘤免疫的分子机制

国家自然科学基金

0+阅读 · 2016年12月31日

基于静电相互作用调控纳米粒子分散的纳米复合材料的设计与制备

国家自然科学基金

0+阅读 · 2014年12月31日

超声激活声动力复合脂质体靶向治疗肝癌的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

从调控星形胶质细胞活化异质性探讨益肾化浊通络法对多发性硬化髓鞘再生适应性保护效应机制

国家自然科学基金

0+阅读 · 2013年12月31日

基于pH响应解聚激活的抗肿瘤纳米光敏剂研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺氧时HIF-1α转录激活自噬蛋白Beclin 1促进鼻咽癌转移机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

TGF-beta调控肝癌微环境中CD4+辅助性T淋巴细胞亚群功能的研究

国家自然科学基金

0+阅读 · 2011年12月31日

CUL4B协同HDAC复合体参与基因转录抑制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

计算力学的可信性问题及其量化模型研究

国家自然科学基金

0+阅读 · 2008年12月31日

Improving uplift model evaluation on RCT data

Arxiv

0+阅读 · 2022年10月5日

ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training

Arxiv

0+阅读 · 2022年10月4日

A Study on the Efficiency and Generalization of Light Hybrid Retrievers

Arxiv

0+阅读 · 2022年10月4日

Reward Learning with Trees: Methods and Evaluation

Arxiv

0+阅读 · 2022年10月3日

Smooth image-to-image translations with latent space interpolations

Arxiv

0+阅读 · 2022年10月3日

Retrieval-based Controllable Molecule Generation

Arxiv

0+阅读 · 2022年9月30日

Zero-Shot Retrieval with Search Agents and Hybrid Environments

Arxiv

0+阅读 · 2022年9月30日

Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices

Arxiv

0+阅读 · 2022年9月30日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

VIP会员

文章信息

相关主题

相关VIP内容

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《以任务为中心的建模未来：将集成数字成熟度路径与用户故事框架融入任务工程》最新文献

《人机协作集成模型中的不确定性捕获》博士论文

运用不可解释人工智能进行军事决策

《以军铁剑战争中的战场决策》最新报告

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Improving uplift model evaluation on RCT data

Arxiv

0+阅读 · 2022年10月5日

ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training

Arxiv

0+阅读 · 2022年10月4日

A Study on the Efficiency and Generalization of Light Hybrid Retrievers

Arxiv

0+阅读 · 2022年10月4日

Reward Learning with Trees: Methods and Evaluation

Arxiv

0+阅读 · 2022年10月3日

Smooth image-to-image translations with latent space interpolations

Arxiv

0+阅读 · 2022年10月3日

Retrieval-based Controllable Molecule Generation

Arxiv

0+阅读 · 2022年9月30日

Zero-Shot Retrieval with Search Agents and Hybrid Environments

Arxiv

0+阅读 · 2022年9月30日

Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices

Arxiv

0+阅读 · 2022年9月30日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

相关基金

TRAF3IP3调控T细胞活性与肿瘤免疫的分子机制

国家自然科学基金

0+阅读 · 2016年12月31日

基于静电相互作用调控纳米粒子分散的纳米复合材料的设计与制备

国家自然科学基金

0+阅读 · 2014年12月31日

超声激活声动力复合脂质体靶向治疗肝癌的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

从调控星形胶质细胞活化异质性探讨益肾化浊通络法对多发性硬化髓鞘再生适应性保护效应机制

国家自然科学基金

0+阅读 · 2013年12月31日

基于pH响应解聚激活的抗肿瘤纳米光敏剂研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺氧时HIF-1α转录激活自噬蛋白Beclin 1促进鼻咽癌转移机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

TGF-beta调控肝癌微环境中CD4+辅助性T淋巴细胞亚群功能的研究

国家自然科学基金

0+阅读 · 2011年12月31日

CUL4B协同HDAC复合体参与基因转录抑制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

计算力学的可信性问题及其量化模型研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员