变异器中自用模式的人类解释和利用:采掘总结案例研究 (Human Interpretation and Exploitation of Self-attention Patterns in Transformers: A Case Study in Extractive Summarization) - 专知论文

会员服务 ·

0

CASE · MoDELS · 注意力机制 · 变换 · 原点 ·

2021 年 12 月 10 日

Human Interpretation and Exploitation of Self-attention Patterns in Transformers: A Case Study in Extractive Summarization

翻译：变异器中自用模式的人类解释和利用:采掘总结案例研究

Raymond Li,Wen Xiao,Lanjun Wang,Giuseppe Carenini

The transformer multi-head self-attention mechanism has been thoroughly investigated recently. On one hand, researchers are interested in understanding why and how transformers work. On the other hand, they propose new attention augmentation methods to make transformers more accurate, efficient and interpretable. In this paper, we synergize these two lines of research in a human-in-the-loop pipeline to first find important task-specific attention patterns. Then those patterns are applied, not only to the original model, but also to smaller models, as a human-guided knowledge distillation process. The benefits of our pipeline are demonstrated in a case study with the extractive summarization task. After finding three meaningful attention patterns in the popular BERTSum model, experiments indicate that when we inject such patterns, both the original and the smaller model show improvements in performance and arguably interpretability.

翻译：最近对变压器多头自留机制进行了彻底调查。一方面,研究人员有兴趣了解变压器为何和如何运作。另一方面,他们提出新的关注增强方法,以使变压器更加准确、高效和易于解释。在本文中,我们在人行中将这两条研究线协同起来,首先找到重要的任务关注模式。然后,这些模式不仅适用于原始模式,而且适用于较小的模型,作为人类指导的知识蒸馏过程。我们输压管的效益在一项采掘合成任务案例研究中得到了证明。在找到流行的BERTSum模型的三个有意义的关注模式之后,实验表明,当我们输入这种模式时,原始和较小的模型都显示业绩的改善和可论证的解释性。

0

相关内容

CASE

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

15%接受率！AAAI2022结果出炉，1349篇上榜，你的paper中了吗？

15%接受率！AAAI2022结果出炉，1349篇上榜，你的paper中了吗？

专知会员服务

37+阅读 · 2021年12月2日

CVPR 二十年，影响力最大的 10 篇论文！

专知会员服务

48+阅读 · 2021年4月18日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【CVPR2020-CUHK】探索和利用GANs中的可解释语义，60页ppt，Exploring and Exploiting Interpretable Semantics in GANs

【CVPR2020-CUHK】探索和利用GANs中的可解释语义，60页ppt，Exploring and Exploiting Interpretable Semantics in GANs

专知会员服务

13+阅读 · 2020年6月18日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Source Code Summarization with Structural Relative Position Guided Transformer

Arxiv

0+阅读 · 2022年2月14日

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Arxiv

0+阅读 · 2022年2月12日

CIL: Contrastive Instance Learning Framework for Distantly Supervised Relation Extraction

Arxiv

4+阅读 · 2021年6月21日

SparseBERT: Rethinking the Importance Analysis in Self-attention

SparseBERT: Rethinking the Importance Analysis in Self-attention

Arxiv

7+阅读 · 2021年2月25日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

Arxiv

4+阅读 · 2019年3月27日

Cloze-driven Pretraining of Self-attention Networks

Arxiv

6+阅读 · 2019年3月19日

Joint entity recognition and relation extraction as a multi-head selection problem

Arxiv

3+阅读 · 2018年12月17日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Arxiv

13+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

15%接受率！AAAI2022结果出炉，1349篇上榜，你的paper中了吗？

15%接受率！AAAI2022结果出炉，1349篇上榜，你的paper中了吗？

专知会员服务

37+阅读 · 2021年12月2日

CVPR 二十年，影响力最大的 10 篇论文！

专知会员服务

48+阅读 · 2021年4月18日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【CVPR2020-CUHK】探索和利用GANs中的可解释语义，60页ppt，Exploring and Exploiting Interpretable Semantics in GANs

【CVPR2020-CUHK】探索和利用GANs中的可解释语义，60页ppt，Exploring and Exploiting Interpretable Semantics in GANs

专知会员服务

13+阅读 · 2020年6月18日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Source Code Summarization with Structural Relative Position Guided Transformer

Arxiv

0+阅读 · 2022年2月14日

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Arxiv

0+阅读 · 2022年2月12日

CIL: Contrastive Instance Learning Framework for Distantly Supervised Relation Extraction

Arxiv

4+阅读 · 2021年6月21日

SparseBERT: Rethinking the Importance Analysis in Self-attention

SparseBERT: Rethinking the Importance Analysis in Self-attention

Arxiv

7+阅读 · 2021年2月25日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

Arxiv

4+阅读 · 2019年3月27日

Cloze-driven Pretraining of Self-attention Networks

Arxiv

6+阅读 · 2019年3月19日

Joint entity recognition and relation extraction as a multi-head selection problem

Arxiv

3+阅读 · 2018年12月17日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Arxiv

13+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员