处理基于 BERT 的 Typos 通道检索和排序 (Dealing with Typos for BERT-based Passage Retrieval and Ranking) - 专知论文

会员服务 ·

0

秩 · BERT · MoDELS · MS MARCO · INFORMS ·

2021 年 8 月 27 日

Dealing with Typos for BERT-based Passage Retrieval and Ranking

翻译：处理基于 BERT 的 Typos 通道检索和排序

Shengyao Zhuang,Guido Zuccon

from arxiv, Short paper, accepted at EMNLP2021 main conference

Passage retrieval and ranking is a key task in open-domain question answering and information retrieval. Current effective approaches mostly rely on pre-trained deep language model-based retrievers and rankers. These methods have been shown to effectively model the semantic matching between queries and passages, also in presence of keyword mismatch, i.e. passages that are relevant to a query but do not contain important query tokens. In this paper we consider the Dense Retriever (DR), a passage retrieval method, and the BERT re-ranker, a popular passage re-ranking method. In this context, we formally investigate how these models respond and adapt to a specific type of keyword mismatch -- that caused by keyword typos occurring in queries. Through empirical investigation, we find that typos can lead to a significant drop in retrieval and ranking effectiveness. We then propose a simple typos-aware training framework for DR and BERT re-ranker to address this issue. Our experimental results on the MS MARCO passage ranking dataset show that, with our proposed typos-aware training, DR and BERT re-ranker can become robust to typos in queries, resulting in significantly improved effectiveness compared to models trained without appropriately accounting for typos.

翻译：在开放式问答和信息检索中, 路由检索和排名是一项关键任务。目前有效的方法主要依赖经过预先训练的深语言模型检索器和排行器。这些方法已证明能够有效地模拟查询和段落之间的语义匹配, 同时也存在关键词不匹配, 即与查询相关的段落, 但不包含重要的查询符号。在本文中, 我们考虑的是Dense Retriever( Dense Retriever), 一种通道检索方法, 以及 BERT 重新排行器, 这是一种受欢迎的重新排行法。在这方面, 我们正式调查这些模型如何应对和适应特定类型的关键词错配 -- -- 由查询中出现的关键词打字打字导致的语匹配。我们通过实验性调查发现, 打字可以导致检索和排行效率显著下降。然后我们为DR和BERT重新排列一个简单的打字培训框架来解决这个问题。我们在MS MARCO 版本排名数据设置上的实验结果显示, 通过我们提议的打字训练培训, DRD和BERT recher recher 能够在查询中变得有效。

0

相关内容

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

【AAAI2021】Co-GAT:一种用于联合对话行为识别和情感分类的协同交互图注意力网络

【AAAI2021】Co-GAT:一种用于联合对话行为识别和情感分类的协同交互图注意力网络

专知会员服务

19+阅读 · 2021年2月1日

【CMU博士论文】信息检索中的神经匹配和重要性学习，163页pdf

【CMU博士论文】信息检索中的神经匹配和重要性学习，163页pdf

专知会员服务

58+阅读 · 2020年7月20日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

语言模型及Word2vec与Bert简析

语言模型及Word2vec与Bert简析

AINLP

6+阅读 · 2020年5月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT源码分析PART I

BERT源码分析PART I

AINLP

38+阅读 · 2019年7月12日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Github项目推荐 | awesome-bert：BERT相关资源大列表

Github项目推荐 | awesome-bert：BERT相关资源大列表

AI研习社

27+阅读 · 2019年2月26日

BERT相关论文、文章和代码资源汇总

BERT相关论文、文章和代码资源汇总

AINLP

19+阅读 · 2018年11月17日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Arxiv

0+阅读 · 2021年10月16日

Optimizing Dense Retrieval Model Training with Hard Negatives

Arxiv

5+阅读 · 2021年4月16日

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Arxiv

11+阅读 · 2021年1月28日

Multi-Stage Document Ranking with BERT

Arxiv

5+阅读 · 2019年10月31日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

Arxiv

6+阅读 · 2019年6月11日

Investigating the Successes and Failures of BERT for Passage Re-Ranking

Investigating the Successes and Failures of BERT for Passage Re-Ranking

Arxiv

3+阅读 · 2019年5月5日

Passage Re-ranking with BERT

Arxiv

4+阅读 · 2019年2月18日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

VIP会员

文章信息

相关主题

相关VIP内容

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

【AAAI2021】Co-GAT:一种用于联合对话行为识别和情感分类的协同交互图注意力网络

【AAAI2021】Co-GAT:一种用于联合对话行为识别和情感分类的协同交互图注意力网络

专知会员服务

19+阅读 · 2021年2月1日

【CMU博士论文】信息检索中的神经匹配和重要性学习，163页pdf

【CMU博士论文】信息检索中的神经匹配和重要性学习，163页pdf

专知会员服务

58+阅读 · 2020年7月20日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

语言模型及Word2vec与Bert简析

语言模型及Word2vec与Bert简析

AINLP

6+阅读 · 2020年5月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT源码分析PART I

BERT源码分析PART I

AINLP

38+阅读 · 2019年7月12日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Github项目推荐 | awesome-bert：BERT相关资源大列表

Github项目推荐 | awesome-bert：BERT相关资源大列表

AI研习社

27+阅读 · 2019年2月26日

BERT相关论文、文章和代码资源汇总

BERT相关论文、文章和代码资源汇总

AINLP

19+阅读 · 2018年11月17日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Arxiv

0+阅读 · 2021年10月16日

Optimizing Dense Retrieval Model Training with Hard Negatives

Arxiv

5+阅读 · 2021年4月16日

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Arxiv

11+阅读 · 2021年1月28日

Multi-Stage Document Ranking with BERT

Arxiv

5+阅读 · 2019年10月31日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

Arxiv

6+阅读 · 2019年6月11日

Investigating the Successes and Failures of BERT for Passage Re-Ranking

Investigating the Successes and Failures of BERT for Passage Re-Ranking

Arxiv

3+阅读 · 2019年5月5日

Passage Re-ranking with BERT

Arxiv

4+阅读 · 2019年2月18日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

微信扫码咨询专知VIP会员