以查询驱动的长文件排级的片段选择 (Query-driven Segment Selection for Ranking Long Documents) - 专知论文

会员服务 ·

0

state-of-the-art · 秩 · Processing（编程语言） · 训练数据 · Performer ·

2021 年 9 月 10 日

Query-driven Segment Selection for Ranking Long Documents

翻译：以查询驱动的长文件排级的片段选择

Youngwoo Kim,Razieh Rahimi,Hamed Bonab,James Allan

from arxiv, 5 pages, 0 figure

Transformer-based rankers have shown state-of-the-art performance. However, their self-attention operation is mostly unable to process long sequences. One of the common approaches to train these rankers is to heuristically select some segments of each document, such as the first segment, as training data. However, these segments may not contain the query-related parts of documents. To address this problem, we propose query-driven segment selection from long documents to build training data. The segment selector provides relevant samples with more accurate labels and non-relevant samples which are harder to be predicted. The experimental results show that the basic BERT-based ranker trained with the proposed segment selector significantly outperforms that trained by the heuristically selected segments, and performs equally to the state-of-the-art model with localized self-attention that can process longer input sequences. Our findings open up new direction to design efficient transformer-based rankers.

翻译：以变换器为基础的排层器显示最先进的性能。但是,它们的自我注意操作大多无法处理长序列。培训这些排层器的常见办法之一是,将每个文件中的某些部分(例如第一部分)作为培训数据进行超自然选择,例如第一部分作为培训数据。但是,这些部分可能并不包含文件中与查询有关的部分。为了解决这一问题,我们建议从长篇文档中选择由查询驱动的段段,以建立培训数据。分区选择器为相关样本提供了更准确的标签和非相关样本,而这些样本难以预测。实验结果显示,以BERT为基础的基本排层组级器受过拟议的部分选取培训,或明显优于由超自然选定部分所培训的外形,其表现与最先进的模型相同,具有本地化的自我意识,可以处理较长的输入序列。我们的发现为设计高效的变压器排层打开了新的方向。

0

相关内容

state-of-the-art

state-of-the-art

【ICCV2021】基于耦合语义注意力的弱监督目标定位

专知会员服务

16+阅读 · 2021年8月2日

基于双注意力机制和迁移学习的跨领域推荐模型

专知会员服务

48+阅读 · 2020年10月20日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

专知会员服务

51+阅读 · 2020年2月24日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

AINLP

15+阅读 · 2019年8月12日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

已删除

将门创投

7+阅读 · 2018年4月25日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

论文浅尝 | Leveraging Knowledge Bases in LSTMs

论文浅尝 | Leveraging Knowledge Bases in LSTMs

开放知识图谱

6+阅读 · 2017年12月8日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

Long Short-Term Transformer for Online Action Detection

Arxiv

0+阅读 · 2021年10月28日

An AI-based Approach for Tracing Content Requirements in Financial Documents

Arxiv

0+阅读 · 2021年10月28日

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

Arxiv

0+阅读 · 2021年10月28日

Learning to Rehearse in Long Sequence Memorization

Arxiv

8+阅读 · 2021年6月2日

Multi-Stage Document Ranking with BERT

Arxiv

5+阅读 · 2019年10月31日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

End-to-End Learning for Answering Structured Queries Directly over Text

Arxiv

3+阅读 · 2018年11月16日

Learning to Focus when Ranking Answers

Learning to Focus when Ranking Answers

Arxiv

5+阅读 · 2018年8月8日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Learning a Deep Listwise Context Model for Ranking Refinement

Arxiv

4+阅读 · 2018年4月23日

VIP会员

文章信息

相关主题

state-of-the-art

Processing（编程语言）

相关VIP内容

【ICCV2021】基于耦合语义注意力的弱监督目标定位

专知会员服务

16+阅读 · 2021年8月2日

基于双注意力机制和迁移学习的跨领域推荐模型

专知会员服务

48+阅读 · 2020年10月20日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

专知会员服务

51+阅读 · 2020年2月24日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【普林斯顿博士论文】在线学习：优化、控制与学习理论

不确定环境下无人机三维路径规划研究 | 221页

【NeurIPS2025】《LeapFactual：基于条件流匹配的可靠视觉反事实解释》

大语言模型将如何改变军事指挥结构

相关资讯

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

AINLP

15+阅读 · 2019年8月12日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

已删除

将门创投

7+阅读 · 2018年4月25日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

论文浅尝 | Leveraging Knowledge Bases in LSTMs

论文浅尝 | Leveraging Knowledge Bases in LSTMs

开放知识图谱

6+阅读 · 2017年12月8日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Long Short-Term Transformer for Online Action Detection

Arxiv

0+阅读 · 2021年10月28日

An AI-based Approach for Tracing Content Requirements in Financial Documents

Arxiv

0+阅读 · 2021年10月28日

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

Arxiv

0+阅读 · 2021年10月28日

Learning to Rehearse in Long Sequence Memorization

Arxiv

8+阅读 · 2021年6月2日

Multi-Stage Document Ranking with BERT

Arxiv

5+阅读 · 2019年10月31日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

End-to-End Learning for Answering Structured Queries Directly over Text

Arxiv

3+阅读 · 2018年11月16日

Learning to Focus when Ranking Answers

Learning to Focus when Ranking Answers

Arxiv

5+阅读 · 2018年8月8日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Learning a Deep Listwise Context Model for Ranking Refinement

Arxiv

4+阅读 · 2018年4月23日

微信扫码咨询专知VIP会员