Fef- Forward 区块控制 (Feed-Forward Blocks Control Contextualization in Masked Language Models) - 专知论文

会员服务 ·

0

掩码语言模型化 · 语言模型化 · Attention · MoDELS · 块 ·

2023 年 2 月 1 日

Feed-Forward Blocks Control Contextualization in Masked Language Models

翻译：Fef- Forward 区块控制

Goro Kobayashi,Tatsuki Kuribayashi,Sho Yokoi,Kentaro Inui

from arxiv, 13 pages, 15 figures

Understanding the inner workings of neural network models is a crucial step for rationalizing their output and refining their architecture. Transformer-based models are the core of recent natural language processing and have been analyzed typically with attention patterns as their epoch-making feature is contextualizing surrounding input words via attention mechanisms. In this study, we analyze their inner contextualization by considering all the components, including the feed-forward block (i.e., a feed-forward layer and its surrounding residual and normalization layers) as well as the attention. Our experiments with masked language models show that each of the previously overlooked components did modify the degree of the contextualization in case of processing special word-word pairs (e.g., consisting of named entities). Furthermore, we find that some components cancel each other's effects. Our results could update the typical view about each component's roles (e.g., attention performs contextualization, and the other components serve different roles) in the Transformer layer.

翻译：理解神经网络模型的内部功能是使其输出合理化和完善其结构的关键步骤。以变换器为基础的模型是最近自然语言处理的核心,并且通常以关注模式进行分析,因为其划时代特征正在通过注意机制将输入文字背景化。在这项研究中,我们通过考虑所有组成部分,包括进料向前方块(即进料向上层及其周围的剩余和正常化层)以及注意力来分析其内在背景化。我们用蒙面语言模型进行的实验表明,在处理特殊词对配时,先前忽略的每个组成部分都改变了背景化的程度(例如由指定实体组成)。此外,我们发现,有些组成部分抵消了对方的效应。我们的结果可以更新关于每个组成部分在变换器层的作用的典型观点(例如,关注表现背景化,而其他组成部分则起到不同的作用 )。

0

相关内容

掩码语言模型化

掩码语言模型化

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

茶树倍半萜代谢差异化响应不同虫害胁迫的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于芳香稠环单元的有机/高分子半导体材料的设计、合成与性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

聚咔唑聚芴盘状高分子液晶的合成与光电性能

国家自然科学基金

0+阅读 · 2012年12月31日

网络环境下非线性互联大系统的模糊双曲建模和鲁棒控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高压下Cd,In,Pb基半导体纳米晶结构、性质及荧光增强效应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂体系无机涂层高温粘附特征响应机理及过程调控

国家自然科学基金

0+阅读 · 2012年12月31日

可调控稀土高分子配合物设计与发光性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

化学键-模板效应协同控制组装稀土/无机/有机高分子杂化发光材料体系

国家自然科学基金

0+阅读 · 2009年12月31日

基于有机-无机杂化膜特性调控的微米/纳米复合粒子界面聚并机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Remote Sensing Scene Classification with Masked Image Modeling (MIM)

Arxiv

0+阅读 · 2023年3月24日

Learning and Verification of Task Structure in Instructional Videos

Arxiv

0+阅读 · 2023年3月23日

Analyzing the Generalizability of Deep Contextualized Language Representations For Text Classification

Arxiv

0+阅读 · 2023年3月22日

MAGVLT: Masked Generative Vision-and-Language Transformer

Arxiv

0+阅读 · 2023年3月21日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Arxiv

15+阅读 · 2020年5月13日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

掩码语言模型化

语言模型化

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Remote Sensing Scene Classification with Masked Image Modeling (MIM)

Arxiv

0+阅读 · 2023年3月24日

Learning and Verification of Task Structure in Instructional Videos

Arxiv

0+阅读 · 2023年3月23日

Analyzing the Generalizability of Deep Contextualized Language Representations For Text Classification

Arxiv

0+阅读 · 2023年3月22日

MAGVLT: Masked Generative Vision-and-Language Transformer

Arxiv

0+阅读 · 2023年3月21日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Arxiv

15+阅读 · 2020年5月13日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

茶树倍半萜代谢差异化响应不同虫害胁迫的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于芳香稠环单元的有机/高分子半导体材料的设计、合成与性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

聚咔唑聚芴盘状高分子液晶的合成与光电性能

国家自然科学基金

0+阅读 · 2012年12月31日

网络环境下非线性互联大系统的模糊双曲建模和鲁棒控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高压下Cd,In,Pb基半导体纳米晶结构、性质及荧光增强效应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂体系无机涂层高温粘附特征响应机理及过程调控

国家自然科学基金

0+阅读 · 2012年12月31日

可调控稀土高分子配合物设计与发光性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

化学键-模板效应协同控制组装稀土/无机/有机高分子杂化发光材料体系

国家自然科学基金

0+阅读 · 2009年12月31日

基于有机-无机杂化膜特性调控的微米/纳米复合粒子界面聚并机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员