高效和准确排名任务组合式变换器:对问答系统的应用 (Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems) - 专知论文

会员服务 ·

0

变换 · MoDELS · 蒸馏 · 秩 · Transformer模型 ·

2022 年 1 月 15 日

Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems

翻译：高效和准确排名任务组合式变换器:对问答系统的应用

Yoshitomo Matsubara,Luca Soldaini,Eric Lind,Alessandro Moschitti

from arxiv, Preprint

Large transformer models can highly improve Answer Sentence Selection (AS2) task, but their high computational costs prevent their use in many real world applications. In this paper, we explore the following research question: How can we make the AS2models more accurate without significantly increasing their model complexity? To address the question, we propose a Multiple Heads Student architecture (MHS), an efficient neural network designed to distill an ensemble of large transformers into a single smaller model. An MHS model consists of two components: a stack of transformer layers that is used to encode inputs, and a set of ranking heads; each of them is trained by distilling a different large transformer architecture. Unlike traditional distillation techniques, our approach leverages individual models in ensemble as teachers in a way that preserves the diversity of the ensemble members. The resulting model captures the knowledge of different types of transformer models by using just a few extra parameters. We show the effectiveness of MHS on three English datasets for AS2; our proposed approach outperforms all single-model distillations we consider, rivaling the state-of-the-art large AS2 models that have 2.7x more parameters and run 2.5x slower.

翻译：大型变压器模型可以大大改进答案句选择任务( AS2), 但是它们高昂的计算成本阻止了它们在许多现实世界应用中的使用。在本文中,我们探索了以下研究问题: 我们如何在不大幅提高模型复杂性的情况下使AS2模型更加准确? 为了解决这个问题,我们建议了多头学生结构( MHS),这是一个高效的神经网络,旨在将大型变压器的组合体蒸馏成一个单一的较小模型。 MHS模型由两个部分组成: 用于编码输入的一组变压器层, 和一组排名头; 每个都通过蒸馏不同的大变压器结构来培训它们。与传统的蒸馏技术不同, 我们的方法是利用单个模型作为教师在共融中运用,以保持共变压成员的多样性。由此产生的模型通过仅仅使用几个额外的参数来捕捉不同类型变压器模型的知识。我们展示了用于AS2的三个英国数据集的MHSH的功效; 我们提出的方法优于所有单一模型的蒸馏结构, 我们所考虑的是, 25x 大型模型运行得更慢的25x 。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

基于时空地统计的极端气温与人群死亡暴露反应研究

国家自然科学基金

4+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

Ta2O5-WO3-RxOy系统相关系及TaW基抗氧化合金组分优化

国家自然科学基金

0+阅读 · 2014年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

设施选址问题基于线性规划的近似算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多小波的机械系统非线性动力学分析与特征提取

国家自然科学基金

0+阅读 · 2011年12月31日

含控制器的电力系统递阶（结构化）模型研究

国家自然科学基金

0+阅读 · 2010年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于FPGA+ARM的电力谐波检测方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

A Survey on Multi-hop Question Answering and Generation

Arxiv

0+阅读 · 2022年4月19日

Towards General Purpose Vision Systems

Arxiv

0+阅读 · 2022年4月19日

Expert Finding in Legal Community Question Answering

Arxiv

0+阅读 · 2022年4月19日

Application of Transfer Learning and Ensemble Learning in Image-level Classification for Breast Histopathology

Application of Transfer Learning and Ensemble Learning in Image-level Classification for Breast Histopathology

Arxiv

0+阅读 · 2022年4月18日

How Different are Pre-trained Transformers for Text Ranking?

Arxiv

0+阅读 · 2022年4月5日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

X-BERT: eXtreme Multi-label Text Classification with BERT

X-BERT: eXtreme Multi-label Text Classification with BERT

Arxiv

12+阅读 · 2019年7月4日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Arxiv

17+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

Transformer模型

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

A Survey on Multi-hop Question Answering and Generation

Arxiv

0+阅读 · 2022年4月19日

Towards General Purpose Vision Systems

Arxiv

0+阅读 · 2022年4月19日

Expert Finding in Legal Community Question Answering

Arxiv

0+阅读 · 2022年4月19日

Application of Transfer Learning and Ensemble Learning in Image-level Classification for Breast Histopathology

Application of Transfer Learning and Ensemble Learning in Image-level Classification for Breast Histopathology

Arxiv

0+阅读 · 2022年4月18日

How Different are Pre-trained Transformers for Text Ranking?

Arxiv

0+阅读 · 2022年4月5日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

X-BERT: eXtreme Multi-label Text Classification with BERT

X-BERT: eXtreme Multi-label Text Classification with BERT

Arxiv

12+阅读 · 2019年7月4日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Arxiv

17+阅读 · 2018年1月15日

相关基金

基于时空地统计的极端气温与人群死亡暴露反应研究

国家自然科学基金

4+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

Ta2O5-WO3-RxOy系统相关系及TaW基抗氧化合金组分优化

国家自然科学基金

0+阅读 · 2014年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

设施选址问题基于线性规划的近似算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多小波的机械系统非线性动力学分析与特征提取

国家自然科学基金

0+阅读 · 2011年12月31日

含控制器的电力系统递阶（结构化）模型研究

国家自然科学基金

0+阅读 · 2010年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于FPGA+ARM的电力谐波检测方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员