Less is More! A slim architecture for optimal language translation - 专知论文

会员服务 ·

0

可约的 · Perplexity · 层 · state-of-the-art · SLIM ·

2023 年 5 月 18 日

Less is More! A slim architecture for optimal language translation

翻译：暂无翻译

Luca Herranz-Celotti,Ermal Rrapaj

The softmax attention mechanism has emerged as a noteworthy development in the field of Artificial Intelligence research, building on the successes of Transformer-based architectures. However, their ever increasing sizes necessitate ever increasing computational memory, that limits their usage. We propose KgV, a sigmoid gating mechanism that, in conjunction with softmax attention, significantly boosts performance without increasing architecture size. To amend the size requirements, we leverage Tensor Chains to identify and prune the excess parameters. We find that such excess resides primarily within the embedding layer, and not in the output linear layer. To further improve embedding and significantly reduce parameters, we introduce H-SoftPOS, a hierarchical embedding layer which simultaneously enhances performance. Remarkably, on the WMT14 English-German validation set, our approach yields a threefold reduction in perplexity, surpassing the current state-of-the-art, while reducing parameter counts also by a factor of 3. When we further reduce the number of parameters up to sevenfold, we can still achieve a 21\% decrease in perplexity with respect to the baseline Transformer. To understand generalization capabilities, we conduct experiments on the 7 language pairs of the WMT17 dataset. Our method outperforms existing techniques in terms of test loss while simultaneously halving the number of parameters. Moreover, we observe a 70 times reduction in variance with respect to the prior state-of-the-art. In conclusion, our proposed method yields significant improvements in performance and much lower memory cost. We call the resulting architecture Anthe.

翻译：暂无翻译

0

相关内容

可约的

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

瞬变电磁法基于有限差分的2.5维反演方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Zakharov系统的解的动力学行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

一类离散Hindmarsh-Rose模型的分支延拓

国家自然科学基金

0+阅读 · 2015年12月31日

两类分数阶发展方程解的适定性及吸引子

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

高效中红外激光晶体Cr,Er,Re:YSGG（Re＝Eu3+, Tb3+）的生长及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

视黄醛蛋白Leptosphaeria Rhodopsin中的质子跨膜传递机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

新兴污染物HO-PBDEs在水相中的环境光化学活性

国家自然科学基金

0+阅读 · 2009年12月31日

Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient Neural Image Compression

Arxiv

0+阅读 · 2023年7月5日

Dynamical Isometry based Rigorous Fair Neural Architecture Search

Arxiv

0+阅读 · 2023年7月5日

AutoST: Training-free Neural Architecture Search for Spiking Transformers

Arxiv

0+阅读 · 2023年7月1日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Neural Architecture Search: A Survey

Arxiv

12+阅读 · 2018年9月5日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《陆军战斗操练中的关键事件诊断》

《自适应训练辅助概念及其在空战管理员加速训练中的应用导论》最新126页

军事通信市场七大趋势概述

《抗干扰无人机蜂群行为的遗传算法方法》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

相关论文

Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient Neural Image Compression

Arxiv

0+阅读 · 2023年7月5日

Dynamical Isometry based Rigorous Fair Neural Architecture Search

Arxiv

0+阅读 · 2023年7月5日

AutoST: Training-free Neural Architecture Search for Spiking Transformers

Arxiv

0+阅读 · 2023年7月1日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Neural Architecture Search: A Survey

Arxiv

12+阅读 · 2018年9月5日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

相关基金

瞬变电磁法基于有限差分的2.5维反演方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Zakharov系统的解的动力学行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

一类离散Hindmarsh-Rose模型的分支延拓

国家自然科学基金

0+阅读 · 2015年12月31日

两类分数阶发展方程解的适定性及吸引子

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

高效中红外激光晶体Cr,Er,Re:YSGG（Re＝Eu3+, Tb3+）的生长及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

视黄醛蛋白Leptosphaeria Rhodopsin中的质子跨膜传递机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

新兴污染物HO-PBDEs在水相中的环境光化学活性

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员