变换器中的简单比值及其学习粗略布尔函数的能力 (Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions) - 专知论文

会员服务 ·

0

变换 · 泛函 · 有偏 · Performer · MoDELS ·

2022 年 11 月 22 日

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

翻译：变换器中的简单比值及其学习粗略布尔函数的能力

Satwik Bhattamishra,Arkil Patel,Varun Kanade,Phil Blunsom

from arxiv, Preprint

Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer's effective generalization performance despite relatively limited expressiveness.

翻译：尽管变换者在NLP任务上取得了广泛成功,但最近的工作发现,他们很难在与经常模式相比时模拟几种正式语言,这就提出了一个问题,即变换者为什么在实践中表现良好,他们是否具有任何能使其比经常模式更普遍化的特性。在这项工作中,我们对布林函数进行了广泛的经验研究,以表明:(一) 随机变换者相对偏向于低敏感度的功能。 (二) 在对布林函数进行培训时,变换者和LSTMS都优先考虑低敏感度的学习功能,而变换者最终会与低敏感度的功能相融合。 (三) 关于稀有的布林功能,我们发现,变换者在噪音标签下几乎完全可以概括,而LSTMS超合适,并且实现不准确性差。总体而言,我们的结果提供了有力的可量化证据,表明变换者和经常模式在诱导的偏向性偏差方面存在差异,尽管表达性相对有限,但可能有助于解释变换者的有效概括性表现。

0

相关内容

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

中性粒细胞TRPM2通道在脓毒症细菌清除中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

牛磺酸抑制AS肉鸡右心肥大过程中calpains介导细胞凋亡作用的研究

国家自然科学基金

0+阅读 · 2015年12月31日

非均质量子器件Schr？dinger-Poisson系统多尺度分析与算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

复合石墨烯负载纳米双金属催化剂的结构调控及其ORR催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

多糖HCP-2对调节性B细胞的诱导作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

功能化MOFs纳米晶强化陶瓷膜资源化处理重金属废水的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Bcl-2小分子抑制剂ABT-737逆转CD34+急性髓细胞白血病耐药及其机制的体内外研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

层状M-O簇担载Ag+，Cu+新型络合吸附剂的分子调控及其乙烯的吸附选择性研究

国家自然科学基金

0+阅读 · 2009年12月31日

Downlink TDMA Scheduling for IRS-aided Communications with Block-Static Constraints

Downlink TDMA Scheduling for IRS-aided Communications with Block-Static Constraints

Arxiv

0+阅读 · 2023年1月25日

An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors

Arxiv

0+阅读 · 2023年1月25日

Same or Different? Diff-Vectors for Authorship Analysis

Arxiv

0+阅读 · 2023年1月24日

Selective Explanations: Leveraging Human Input to Align Explainable AI

Arxiv

0+阅读 · 2023年1月23日

Be More Active! Understanding the Differences between Mean and Sampled Representations of Variational Autoencoders

Arxiv

0+阅读 · 2023年1月23日

MetaSys: A Practical Open-Source Metadata Management System to Implement and Evaluate Cross-Layer Optimizations

MetaSys: A Practical Open-Source Metadata Management System to Implement and Evaluate Cross-Layer Optimizations

Arxiv

0+阅读 · 2023年1月21日

Likelihood-based generalization of Markov parameter estimation and multiple shooting objectives in system identification

Arxiv

0+阅读 · 2023年1月20日

A survey and taxonomy of loss functions in machine learning

Arxiv

26+阅读 · 2023年1月13日

Multimodality in Meta-Learning: A Comprehensive Survey

Arxiv

37+阅读 · 2021年9月28日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

VIP会员

文章信息

相关主题

相关VIP内容

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACMMM2025教程】打击网络虚假信息视频：特征分析、检测与防范，170页ppt

海军无人系统：海上作战的演进而非革命

Nature 子刊 | SciToolAgent:知识图谱引导的科学工具智能体

多媒体顶会ACM Multimedia 2025各大奖项揭晓！格拉斯哥大学等获最佳论文，中科院自动化所等获最佳学生论文

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Downlink TDMA Scheduling for IRS-aided Communications with Block-Static Constraints

Downlink TDMA Scheduling for IRS-aided Communications with Block-Static Constraints

Arxiv

0+阅读 · 2023年1月25日

An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors

Arxiv

0+阅读 · 2023年1月25日

Same or Different? Diff-Vectors for Authorship Analysis

Arxiv

0+阅读 · 2023年1月24日

Selective Explanations: Leveraging Human Input to Align Explainable AI

Arxiv

0+阅读 · 2023年1月23日

Be More Active! Understanding the Differences between Mean and Sampled Representations of Variational Autoencoders

Arxiv

0+阅读 · 2023年1月23日

MetaSys: A Practical Open-Source Metadata Management System to Implement and Evaluate Cross-Layer Optimizations

MetaSys: A Practical Open-Source Metadata Management System to Implement and Evaluate Cross-Layer Optimizations

Arxiv

0+阅读 · 2023年1月21日

Likelihood-based generalization of Markov parameter estimation and multiple shooting objectives in system identification

Arxiv

0+阅读 · 2023年1月20日

A survey and taxonomy of loss functions in machine learning

Arxiv

26+阅读 · 2023年1月13日

Multimodality in Meta-Learning: A Comprehensive Survey

Arxiv

37+阅读 · 2021年9月28日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

相关基金

中性粒细胞TRPM2通道在脓毒症细菌清除中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

牛磺酸抑制AS肉鸡右心肥大过程中calpains介导细胞凋亡作用的研究

国家自然科学基金

0+阅读 · 2015年12月31日

非均质量子器件Schr？dinger-Poisson系统多尺度分析与算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

复合石墨烯负载纳米双金属催化剂的结构调控及其ORR催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

多糖HCP-2对调节性B细胞的诱导作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

功能化MOFs纳米晶强化陶瓷膜资源化处理重金属废水的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Bcl-2小分子抑制剂ABT-737逆转CD34+急性髓细胞白血病耐药及其机制的体内外研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

层状M-O簇担载Ag+，Cu+新型络合吸附剂的分子调控及其乙烯的吸附选择性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员