随机激活函数 (Stochastic activations)

Maria Lomeli,Matthijs Douze,Gergely Szilvasy,Loic Cabannes,Jade Copet,Sainbayar Sukhbaatar,Jason Weston,Gabriel Synnaeve,Pierre-Emmanuel Mazaré,Hervé Jégou

We introduce stochastic activations. This novel strategy randomly selects between several non-linear functions in the feed-forward layer of a large language model. In particular, we choose between SILU or RELU depending on a Bernoulli draw. This strategy circumvents the optimization problem associated with RELU, namely, the constant shape for negative inputs that prevents the gradient flow. We leverage this strategy in two ways: (1) We use stochastic activations during pre-training and fine-tune the model with RELU, which is used at inference time to provide sparse latent vectors. This reduces the inference FLOPs and translates into a significant speedup in the CPU. Interestingly, this leads to much better results than training from scratch with the RELU activation function. (2) We evaluate stochastic activations for generation. This strategy performs reasonably well: it is only slightly inferior to the best deterministic non-linearity, namely SILU combined with temperature scaling. This offers an alternative to existing strategies by providing a controlled way to increase the diversity of the generated text.

翻译：本文提出随机激活函数策略。该创新方法在大语言模型的前馈层中随机选择多种非线性函数进行激活。具体而言，我们根据伯努利分布随机选择SILU或RELU函数。该策略有效规避了RELU函数固有的优化难题——即负值输入区间的恒定形态会阻碍梯度流动。我们通过两种方式应用此策略：（1）在预训练阶段采用随机激活函数，随后使用RELU对模型进行微调，并在推理阶段利用RELU生成稀疏隐向量。该方法显著降低了推理时的浮点运算量，在CPU上实现了可观的加速效果。值得注意的是，相比直接使用RELU激活函数从头训练模型，该策略能获得更优的性能表现。（2）我们评估了随机激活函数在文本生成任务中的效果。该策略表现出良好的性能：仅略逊于最优确定性非线性方案（即SILU结合温度缩放技术）。这为现有文本生成策略提供了新思路，通过可控方式有效提升了生成文本的多样性。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

生成式建模：综述

专知会员服务

33+阅读 · 1月13日

【ICML2024】基于正则化的持续学习的统计理论

专知会员服务

21+阅读 · 2024年6月11日

【ICML2021】基于低秩重参数化的大规模私有学习

专知会员服务

12+阅读 · 2021年6月20日

【ICML2021】数据表示的几何评估

专知会员服务

38+阅读 · 2021年6月3日