【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim - 专知VIP

会员服务 ·

2

BERT · 机器学习 · Transformer · 预训练语言模型 · Google AI ·

2019 年 11 月 18 日

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

专知会员服务

专知，提供专业可信的知识分发服务，让认知协作更快更好！

文章名

Machine Learning Interviews - Machine Learning Systems Design

文章简介

作者长期从事人工智能领域中的自然语言处理工作，最近google人工智能团队结合自然语言预处理，创造了许多自然语言模型，如最近火热的BERT模型。这该文章中作者不仅仅介绍了BERT模型，而且还介绍其最近比较火热的变种，如 RoBERTa, ALBERT, SpanBERT, DistilBERT, SesameBERT, SemBERT, MobileBERT, TinyBERT and CamemBERT。注意力机制引入深度学习模型。作者讨论的不仅是名为“BERT”的体系结构，而是更正确的基于变压器的体系结构。基于Transformer的架构主要用于语言理解任务的建模，它避免了神经网络中递归的使用，而是完全依赖于自我注意机制来绘制输入和输出之间的全局依赖关系。但背后的数学原理是什么？这就是今天要讨论的。这篇文章的主要内容是让你了解自我关注模块中涉及的数学运算。在本文结束时，您应该能够从头开始编写或编写自我关注模块。本文的目的并不是在自我注意模块中提供不同数值表示和数学运算背后的直观和解释。它也不旨在证明为什么和如何在变形金刚的自我关注（我相信已经有很多）。注意，注意和自我注意之间的区别在本文中也没有详细说明。

文章作者

Raimi Karim，长期从事人工智能研究，是机器学习领域专家级人物，擅长自然语言处理，在研究过程中，主张机器学习要面向实践，面向实际，立志解决当前问题，AI必须要有商业驱动，方能足够长远的发展。

成为VIP会员查看完整内容

45

相关内容

BERT

BERT全称Bidirectional Encoder Representations from Transformers，是预训练语言表示的方法，可以在大型文本语料库（如维基百科）上训练通用的“语言理解”模型，然后将该模型用于下游NLP任务，比如机器翻译、问答。

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

专知会员服务

26+阅读 · 2020年4月2日

【ICLR2020-MIT】元学习的好奇心算法，Meta-learning curiosity algorithms

【ICLR2020-MIT】元学习的好奇心算法，Meta-learning curiosity algorithms

专知会员服务

34+阅读 · 2020年3月13日

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

专知会员服务

43+阅读 · 2020年2月25日

【AAAI2020知识图谱论文概述】Knowledge Graphs @ AAAI 2020

【AAAI2020知识图谱论文概述】Knowledge Graphs @ AAAI 2020

专知会员服务

134+阅读 · 2020年2月13日

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

专知会员服务

37+阅读 · 2020年1月12日

【文章|BERT三步使用NLP迁移学习】NLP Transfer Learning In 3 Steps

【文章|BERT三步使用NLP迁移学习】NLP Transfer Learning In 3 Steps

专知会员服务

51+阅读 · 2019年11月26日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

【《图解深度学习》电子书与代码，830页pdf】’Deep Learning Illustrated (2019)' by Deep Learning Study Group GitHub

【《图解深度学习》电子书与代码，830页pdf】’Deep Learning Illustrated (2019)' by Deep Learning Study Group GitHub

专知会员服务

152+阅读 · 2019年1月1日

一文读懂自注意力机制：8大步骤图解+代码

一文读懂自注意力机制：8大步骤图解+代码

新智元

153+阅读 · 2019年11月26日

11篇ICLR2020满分文章，来看看他们都在做什么？

11篇ICLR2020满分文章，来看看他们都在做什么？

专知

18+阅读 · 2019年11月7日

深入理解BERT Transformer ，不仅仅是注意力机制

深入理解BERT Transformer ，不仅仅是注意力机制

大数据文摘

22+阅读 · 2019年3月19日

Self-Attention GAN 中的 self-attention 机制

Self-Attention GAN 中的 self-attention 机制

PaperWeekly

12+阅读 · 2019年3月6日

语义分割 | context relation

语义分割 | context relation

极市平台

8+阅读 · 2019年2月9日

推理速度快千倍！谷歌开源语言模型Transformer-XL

推理速度快千倍！谷歌开源语言模型Transformer-XL

AI前线

9+阅读 · 2019年1月26日

CMU、谷歌提出Transformer-XL：学习超长上下文关系

CMU、谷歌提出Transformer-XL：学习超长上下文关系

机器之心

9+阅读 · 2019年1月18日

EMNLP 2018 | 为什么使用自注意力机制？

EMNLP 2018 | 为什么使用自注意力机制？

机器之心

8+阅读 · 2018年9月17日

干货 | NLP中的self-attention【自-注意力】机制

干货 | NLP中的self-attention【自-注意力】机制

机器学习算法与Python学习

12+阅读 · 2018年4月11日

学界 | 对比对齐模型：神经机器翻译中的注意力到底在注意什么

学界 | 对比对齐模型：神经机器翻译中的注意力到底在注意什么

机器之心

10+阅读 · 2017年10月15日

Question Generation by Transformers

Question Generation by Transformers

Arxiv

5+阅读 · 2019年9月14日

BERTScore: Evaluating Text Generation with BERT

Arxiv

5+阅读 · 2019年4月21日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms

Arxiv

3+阅读 · 2018年11月13日

You May Not Need Attention

Arxiv

4+阅读 · 2018年10月31日

Self-Attention Generative Adversarial Networks

Arxiv

8+阅读 · 2018年5月21日

MARS: Memory Attention-Aware Recommender System

Arxiv

6+阅读 · 2018年5月18日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

Fictitious GAN: Training GANs with Historical Models

Arxiv

4+阅读 · 2018年3月23日

VIP会员

相关主题

预训练语言模型

相关VIP内容

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

专知会员服务

26+阅读 · 2020年4月2日

【ICLR2020-MIT】元学习的好奇心算法，Meta-learning curiosity algorithms

【ICLR2020-MIT】元学习的好奇心算法，Meta-learning curiosity algorithms

专知会员服务

34+阅读 · 2020年3月13日

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

专知会员服务

43+阅读 · 2020年2月25日

【AAAI2020知识图谱论文概述】Knowledge Graphs @ AAAI 2020

【AAAI2020知识图谱论文概述】Knowledge Graphs @ AAAI 2020

专知会员服务

134+阅读 · 2020年2月13日

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

【ICLR2020论文】自我注意力与卷积层的关系，On the Relationship between Self-Attention and Convolutional Layers

专知会员服务

37+阅读 · 2020年1月12日

【文章|BERT三步使用NLP迁移学习】NLP Transfer Learning In 3 Steps

【文章|BERT三步使用NLP迁移学习】NLP Transfer Learning In 3 Steps

专知会员服务

51+阅读 · 2019年11月26日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

【《图解深度学习》电子书与代码，830页pdf】’Deep Learning Illustrated (2019)' by Deep Learning Study Group GitHub

【《图解深度学习》电子书与代码，830页pdf】’Deep Learning Illustrated (2019)' by Deep Learning Study Group GitHub

专知会员服务

152+阅读 · 2019年1月1日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

一文读懂自注意力机制：8大步骤图解+代码

一文读懂自注意力机制：8大步骤图解+代码

新智元

153+阅读 · 2019年11月26日

11篇ICLR2020满分文章，来看看他们都在做什么？

11篇ICLR2020满分文章，来看看他们都在做什么？

专知

18+阅读 · 2019年11月7日

深入理解BERT Transformer ，不仅仅是注意力机制

深入理解BERT Transformer ，不仅仅是注意力机制

大数据文摘

22+阅读 · 2019年3月19日

Self-Attention GAN 中的 self-attention 机制

Self-Attention GAN 中的 self-attention 机制

PaperWeekly

12+阅读 · 2019年3月6日

语义分割 | context relation

语义分割 | context relation

极市平台

8+阅读 · 2019年2月9日

推理速度快千倍！谷歌开源语言模型Transformer-XL

推理速度快千倍！谷歌开源语言模型Transformer-XL

AI前线

9+阅读 · 2019年1月26日

CMU、谷歌提出Transformer-XL：学习超长上下文关系

CMU、谷歌提出Transformer-XL：学习超长上下文关系

机器之心

9+阅读 · 2019年1月18日

EMNLP 2018 | 为什么使用自注意力机制？

EMNLP 2018 | 为什么使用自注意力机制？

机器之心

8+阅读 · 2018年9月17日

干货 | NLP中的self-attention【自-注意力】机制

干货 | NLP中的self-attention【自-注意力】机制

机器学习算法与Python学习

12+阅读 · 2018年4月11日

学界 | 对比对齐模型：神经机器翻译中的注意力到底在注意什么

学界 | 对比对齐模型：神经机器翻译中的注意力到底在注意什么

机器之心

10+阅读 · 2017年10月15日

相关论文

Question Generation by Transformers

Question Generation by Transformers

Arxiv

5+阅读 · 2019年9月14日

BERTScore: Evaluating Text Generation with BERT

Arxiv

5+阅读 · 2019年4月21日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms

Arxiv

3+阅读 · 2018年11月13日

You May Not Need Attention

Arxiv

4+阅读 · 2018年10月31日

Self-Attention Generative Adversarial Networks

Arxiv

8+阅读 · 2018年5月21日

MARS: Memory Attention-Aware Recommender System

Arxiv

6+阅读 · 2018年5月18日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

Fictitious GAN: Training GANs with Historical Models

Arxiv

4+阅读 · 2018年3月23日

微信扫码咨询专知VIP会员