用于大粗缩模型的散图图层 (Hash Layers For Large Sparse Models) - 专知论文

会员服务 ·

0

哈希学习 · 层 · 稀疏 · MoDELS · 词元分析器 ·

2021 年 6 月 8 日

Hash Layers For Large Sparse Models

翻译：用于大粗缩模型的散图图层

Stephen Roller,Sainbayar Sukhbaatar,Arthur Szlam,Jason Weston

We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models. Specifically, we modify the feedforward layer to hash to different sets of weights depending on the current token, over all tokens in the sequence. We show that this procedure either outperforms or is competitive with learning-to-route mixture-of-expert methods such as Switch Transformers and BASE Layers, while requiring no routing parameters or extra terms in the objective function such as a load balancing loss, and no sophisticated assignment algorithm. We study the performance of different hashing techniques, hash sizes and input features, and show that balanced and random hashes focused on the most local features work best, compared to either learning clusters or using longer-range context. We show our approach works well both on large language modeling and dialogue tasks, and on downstream fine-tuning tasks.

翻译：我们在大型变换器模型中根据散列方式对不同投入使用不同参数的稀疏层进行培训。具体地说, 我们修改进料层, 使进料层变成根据当前符号的不同重量组, 取决于序列中所有符号的重量组。我们显示,这个程序要么优于或优于诸如开关变换器和BASE图层等从学习到路径的混合专家方法, 而同时在目标功能中不要求路线参数或额外条件, 如负载平衡损失, 没有复杂的分配算法。我们研究不同的散列技术、散列大小和输入特性的性能, 并显示平衡和随机地掌握着最适合本地特征的工作, 而不是学习集群或使用远程环境。我们展示了我们的方法在大型语言建模和对话任务以及下游微调任务上都很有效。

0

相关内容

哈希学习

【斯坦福&Facebook】生成式对抗变换器，Generative Adversarial Transformers

专知会员服务

21+阅读 · 2021年4月21日

最新《图理论》笔记书，98页pdf

最新《图理论》笔记书，98页pdf

专知会员服务

76+阅读 · 2020年12月27日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【AAAI 2019】双曲异构信息网络嵌入，Hyperbolic Heterogeneous Information Network Embedding

【AAAI 2019】双曲异构信息网络嵌入，Hyperbolic Heterogeneous Information Network Embedding

专知会员服务

60+阅读 · 2020年6月28日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【ECML-PKDD 2019】可扩展的深度无监督集群与具体的GMVAEs（Scalable Deep Unsupervised Clustering with Concrete GMVAEs）

【ECML-PKDD 2019】可扩展的深度无监督集群与具体的GMVAEs（Scalable Deep Unsupervised Clustering with Concrete GMVAEs）

专知会员服务

7+阅读 · 2019年12月3日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

4+阅读 · 2018年11月15日

【论文推荐】最新五篇生成对抗网络相关论文—异构推理、姿态归一化图像生成、权重共享、对抗泛化方法、深层语义哈希、高分辨率深度卷积

【论文推荐】最新五篇生成对抗网络相关论文—异构推理、姿态归一化图像生成、权重共享、对抗泛化方法、深层语义哈希、高分辨率深度卷积

专知

7+阅读 · 2018年5月16日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

专知

7+阅读 · 2018年2月9日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

Sparse Sequence-to-Sequence Models

Sparse Sequence-to-Sequence Models

Arxiv

5+阅读 · 2019年5月14日

Deep Graph Convolutional Encoders for Structured Data to Text Generation

Arxiv

6+阅读 · 2018年10月23日

Large-Scale Learnable Graph Convolutional Networks

Arxiv

3+阅读 · 2018年8月12日

Billion-scale Network Embedding with Iterative Random Projection

Arxiv

5+阅读 · 2018年5月7日

Generative Model for Heterogeneous Inference

Arxiv

4+阅读 · 2018年4月26日

Deep Semantic Hashing with Generative Adversarial Networks

Arxiv

5+阅读 · 2018年4月23日

Learning a Deep Listwise Context Model for Ranking Refinement

Arxiv

4+阅读 · 2018年4月16日

Discrete Autoencoders for Sequence Models

Arxiv

6+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

【斯坦福&Facebook】生成式对抗变换器，Generative Adversarial Transformers

专知会员服务

21+阅读 · 2021年4月21日

最新《图理论》笔记书，98页pdf

最新《图理论》笔记书，98页pdf

专知会员服务

76+阅读 · 2020年12月27日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【AAAI 2019】双曲异构信息网络嵌入，Hyperbolic Heterogeneous Information Network Embedding

【AAAI 2019】双曲异构信息网络嵌入，Hyperbolic Heterogeneous Information Network Embedding

专知会员服务

60+阅读 · 2020年6月28日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【ECML-PKDD 2019】可扩展的深度无监督集群与具体的GMVAEs（Scalable Deep Unsupervised Clustering with Concrete GMVAEs）

【ECML-PKDD 2019】可扩展的深度无监督集群与具体的GMVAEs（Scalable Deep Unsupervised Clustering with Concrete GMVAEs）

专知会员服务

7+阅读 · 2019年12月3日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

4+阅读 · 2018年11月15日

【论文推荐】最新五篇生成对抗网络相关论文—异构推理、姿态归一化图像生成、权重共享、对抗泛化方法、深层语义哈希、高分辨率深度卷积

【论文推荐】最新五篇生成对抗网络相关论文—异构推理、姿态归一化图像生成、权重共享、对抗泛化方法、深层语义哈希、高分辨率深度卷积

专知

7+阅读 · 2018年5月16日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

专知

7+阅读 · 2018年2月9日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

Sparse Sequence-to-Sequence Models

Sparse Sequence-to-Sequence Models

Arxiv

5+阅读 · 2019年5月14日

Deep Graph Convolutional Encoders for Structured Data to Text Generation

Arxiv

6+阅读 · 2018年10月23日

Large-Scale Learnable Graph Convolutional Networks

Arxiv

3+阅读 · 2018年8月12日

Billion-scale Network Embedding with Iterative Random Projection

Arxiv

5+阅读 · 2018年5月7日

Generative Model for Heterogeneous Inference

Arxiv

4+阅读 · 2018年4月26日

Deep Semantic Hashing with Generative Adversarial Networks

Arxiv

5+阅读 · 2018年4月23日

Learning a Deep Listwise Context Model for Ranking Refinement

Arxiv

4+阅读 · 2018年4月16日

Discrete Autoencoders for Sequence Models

Arxiv

6+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员