实现和理解在小规模变形器中系统性理由中普遍分配 (Achieving and Understanding Out-of-Distribution Generalization in Systematic Reasoning in Small-Scale Transformers) - 专知论文

会员服务 ·

0

泛化理论 · 可理解性 · 变换 · Networking · 样例 ·

2022 年 12 月 13 日

Achieving and Understanding Out-of-Distribution Generalization in Systematic Reasoning in Small-Scale Transformers

翻译：实现和理解在小规模变形器中系统性理由中普遍分配

Andrew J. Nam,Mustafa Abdool,Trevor Maxfield,James L. McClelland

Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language models have pushed the boundaries on how well neural networks can solve previously unseen problems, but their complexity and lack of clarity about the relevant content in their training data obfuscates how they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in small scale transformers trained with examples from a known distribution. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on a complex problem if the training set includes examples sampled from the whole distribution of simpler component tasks. Successful generalization depends on carefully managing positional alignment when absolute position encoding is used, but we find that suppressing sensitivity to absolute positions overcomes this limitation. Taken together our results represent a small step toward understanding and promoting systematic generalization in transformers.

翻译：对神经网络来说,传播通用性(OODG)是一个长期的挑战。这一挑战在定义明确的变量和规则的任务中非常明显,明确使用规则可以解决与变量特定值无关的问题,但网络往往与培训数据中抽样的数值范围相关。大型变压器语言模型拉动了神经网络能如何很好地解决先前不为人知的问题的界限,但其复杂性及其培训数据中相关内容的不明确性模糊了它们是如何实现这种稳健性的。作为了解基于变压器的系统如何普遍化的一个步骤,我们探索了小规模变压器中ODG的问题,以已知分布的范例来培训。我们利用基于谜题的推理任务表明,如果成套培训包括从整个更简单的组件任务分布中抽样的范例,ODG可以出现复杂的问题。成功化取决于在使用绝对位置编码时谨慎地管理位置上的调整,但我们发现抑制对绝对位置的敏感度克服了这一限制。我们的结果加在一起代表了在理解和促进变压器系统化过程中的一小步。

0

相关内容

泛化理论

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

纳米颗粒与持久性有机污染物的复合毒性及其机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

功能性遗传变异调控BARD1/BRCA1泛素化通路的机制及与儿童神经母细胞瘤的关联研究

国家自然科学基金

0+阅读 · 2013年12月31日

电子垃圾污染土壤的典型持久性有机污染物二次排放规律及其区域影响

国家自然科学基金

0+阅读 · 2013年12月31日

POPs-重金属复合污染土壤的等离子体/粘土矿物联合修复行为与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

土壤拟除虫菊酯低浓度长期暴露的毒性效应及致毒机理

国家自然科学基金

0+阅读 · 2012年12月31日

负载双金属纳米催化剂表面组成、结构与性能关系

国家自然科学基金

0+阅读 · 2012年12月31日

纳米银的胚胎发育毒性机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

痕量持久性有机污染物快速检测的光电流传感研究

国家自然科学基金

0+阅读 · 2011年12月31日

多孔介质中以超临界流体为载体的沉积过程动力学及其控制方法

国家自然科学基金

0+阅读 · 2008年12月31日

Word class representations spontaneously emerge in a deep neural network trained on next word prediction

Arxiv

0+阅读 · 2023年2月15日

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

Arxiv

0+阅读 · 2023年2月15日

Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models

Arxiv

0+阅读 · 2023年2月14日

Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

Arxiv

0+阅读 · 2023年2月13日

A Theoretical Understanding of shallow Vision Transformers: Learning, Generalization, and Sample Complexity

Arxiv

0+阅读 · 2023年2月12日

Out-of-Distribution Representation Learning for Time Series Classification

Arxiv

1+阅读 · 2023年2月10日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

正视无人机心理战：恐惧效应与战略反思

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Word class representations spontaneously emerge in a deep neural network trained on next word prediction

Arxiv

0+阅读 · 2023年2月15日

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

Arxiv

0+阅读 · 2023年2月15日

Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models

Arxiv

0+阅读 · 2023年2月14日

Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

Arxiv

0+阅读 · 2023年2月13日

A Theoretical Understanding of shallow Vision Transformers: Learning, Generalization, and Sample Complexity

Arxiv

0+阅读 · 2023年2月12日

Out-of-Distribution Representation Learning for Time Series Classification

Arxiv

1+阅读 · 2023年2月10日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

纳米颗粒与持久性有机污染物的复合毒性及其机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

功能性遗传变异调控BARD1/BRCA1泛素化通路的机制及与儿童神经母细胞瘤的关联研究

国家自然科学基金

0+阅读 · 2013年12月31日

电子垃圾污染土壤的典型持久性有机污染物二次排放规律及其区域影响

国家自然科学基金

0+阅读 · 2013年12月31日

POPs-重金属复合污染土壤的等离子体/粘土矿物联合修复行为与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

土壤拟除虫菊酯低浓度长期暴露的毒性效应及致毒机理

国家自然科学基金

0+阅读 · 2012年12月31日

负载双金属纳米催化剂表面组成、结构与性能关系

国家自然科学基金

0+阅读 · 2012年12月31日

纳米银的胚胎发育毒性机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

痕量持久性有机污染物快速检测的光电流传感研究

国家自然科学基金

0+阅读 · 2011年12月31日

多孔介质中以超临界流体为载体的沉积过程动力学及其控制方法

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员