变换者如何学习主题结构:通向机械理解</s> (How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding) - 专知论文

会员服务 ·

0

可理解性 · Learning · 变换 · Analysis · 层 ·

2023 年 3 月 7 日

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding

翻译：变换者如何学习主题结构:通向机械理解

Yuchen Li,Yuanzhi Li,Andrej Risteski

While the successes of transformers across many domains are indisputable, accurate understanding of the learning mechanics is still largely lacking. Their capabilities have been probed on benchmarks which include a variety of structured and reasoning tasks -- but mathematical understanding is lagging substantially behind. Recent lines of work have begun studying representational aspects of this question: that is, the size/depth/complexity of attention-based networks to perform certain tasks. However, there is no guarantee the learning dynamics will converge to the constructions proposed. In our paper, we provide fine-grained mechanistic understanding of how transformers learn "semantic structure", understood as capturing co-occurrence structure of words. Precisely, we show, through a combination of experiments on synthetic data modeled by Latent Dirichlet Allocation (LDA), Wikipedia data, and mathematical analysis that the embedding layer and the self-attention layer encode the topical structure. In the former case, this manifests as higher average inner product of embeddings between same-topic words. In the latter, it manifests as higher average pairwise attention between same-topic words. The mathematical results involve several assumptions to make the analysis tractable, which we verify on data, and might be of independent interest as well.

翻译：虽然在许多领域变压器的成功是无可争议的,但是对学习机理的准确理解在很大程度上仍然缺乏。它们的能力已经根据包括各种结构化和推理性任务在内的基准进行了调查,但数学理解却大大落后。最近的工作线已开始研究这一问题的代表性方面:即关注网络的大小/深度/复杂性,以完成某些任务。然而,不能保证学习动态会与提议的构思相融合。在我们的文件中,我们提供了精细的机械化理解,说明变压器如何学会“结构”,被理解为捕捉同位词结构。准确地说,我们通过将Lenttant Dirichlet分配(LDA)、维基百科数据以及数学分析模型合成数据模型的实验结合起来,表明嵌入层和自留层对主题结构进行编码。在前一种情况下,这表现为同一主题词之间嵌入的较高平均内产物。在后一种词中,它显示同一主题词之间的平均关注度较高。数学结果涉及若干项假设,使我们能够进行独立的数据分析,从而可以核实。</s>

0

相关内容

可理解性

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

51+阅读 · 2022年10月22日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

中药注射剂不良反应随机森林信号模型建立及免疫毒理学评价方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

空间分数阶Schr？dinger方程的时间分裂谱方法

国家自然科学基金

0+阅读 · 2014年12月31日

Orexin/OX1R激动FOXO1/Atg7干预胰岛β细胞自噬的机制及其在胰岛功能缺陷中的意义

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

奇性空间上的几何分析

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

探寻与高功能孤独症和Asperger综合征相关的拷贝数变异

国家自然科学基金

0+阅读 · 2009年12月31日

基于乙酰丙酮的离子液体在溶液中的结构及催化活性研究

国家自然科学基金

0+阅读 · 2008年12月31日

小电导Ca2+激活K+通道与ryanodine受体功能性偶联的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Pretrain on just structure: Understanding linguistic inductive biases using transfer learning

Arxiv

0+阅读 · 2023年4月25日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

AliMe KBQA: Question Answering over Structured Knowledge for E-commerce Customer Service

AliMe KBQA: Question Answering over Structured Knowledge for E-commerce Customer Service

Arxiv

23+阅读 · 2019年12月12日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Arxiv

11+阅读 · 2018年12月8日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

51+阅读 · 2022年10月22日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Pretrain on just structure: Understanding linguistic inductive biases using transfer learning

Arxiv

0+阅读 · 2023年4月25日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

AliMe KBQA: Question Answering over Structured Knowledge for E-commerce Customer Service

AliMe KBQA: Question Answering over Structured Knowledge for E-commerce Customer Service

Arxiv

23+阅读 · 2019年12月12日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Arxiv

11+阅读 · 2018年12月8日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

相关基金

中药注射剂不良反应随机森林信号模型建立及免疫毒理学评价方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

空间分数阶Schr？dinger方程的时间分裂谱方法

国家自然科学基金

0+阅读 · 2014年12月31日

Orexin/OX1R激动FOXO1/Atg7干预胰岛β细胞自噬的机制及其在胰岛功能缺陷中的意义

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

奇性空间上的几何分析

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

探寻与高功能孤独症和Asperger综合征相关的拷贝数变异

国家自然科学基金

0+阅读 · 2009年12月31日

基于乙酰丙酮的离子液体在溶液中的结构及催化活性研究

国家自然科学基金

0+阅读 · 2008年12月31日

小电导Ca2+激活K+通道与ryanodine受体功能性偶联的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员