SERRF: 利用对数- Softplus ERror 激活功能更好地培训深层神经网络 (SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function)

Activation functions play a pivotal role in determining the training dynamics and neural network performance. The widely adopted activation function ReLU despite being simple and effective has few disadvantages including the Dying ReLU problem. In order to tackle such problems, we propose a novel activation function called Serf which is self-regularized and nonmonotonic in nature. Like Mish, Serf also belongs to the Swish family of functions. Based on several experiments on computer vision (image classification and object detection) and natural language processing (machine translation, sentiment classification and multimodal entailment) tasks with different state-of-the-art architectures, it is observed that Serf vastly outperforms ReLU (baseline) and other activation functions including both Swish and Mish, with a markedly bigger margin on deeper architectures. Ablation studies further demonstrate that Serf based architectures perform better than those of Swish and Mish in varying scenarios, validating the effectiveness and compatibility of Serf with varying depth, complexity, optimizers, learning rates, batch sizes, initializers and dropout rates. Finally, we investigate the mathematical relation between Swish and Serf, thereby showing the impact of preconditioner function ingrained in the first derivative of Serf which provides a regularization effect making gradients smoother and optimization faster.

翻译：激活功能在确定培训动态和神经网络性能方面发挥着关键作用。尽管广泛采用的激活功能ReLU是简单而有效的,但它几乎没有什么缺点,包括Dying ReLU问题。为了解决这些问题,我们提议了一个叫Serf的新型激活功能,这个功能在性质上是自我正规化的,不具有运动性。像Mish一样,Serf也属于Swish功能大家庭。基于计算机视觉(图像分类和物体探测)和自然语言处理(机器翻译、情绪分类和多式联运要求)等不同先进结构的任务的若干实验,发现Serf大大超越了ReLU(基线)和其他激活功能,包括Swish和Mish,在更深的建筑上有很大的优势。通货膨胀研究进一步表明,Serf的架构在不同情景下比Shish和Mish的功能要好,检验Serf与不同深度、复杂性、优化、学习率、批量尺寸、初始化器和辍学率等任务的有效性和兼容性。最后,我们调查Serfrain的数学关系在Swish和Serrevelimer的升级中首次展示了Serregildal 和Serregraphildalimlap 之间,从而显示了Sermalalimermaint 和Sergildalimpalimpal 的更高化效果。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

首篇「课程学习（Curriculum Learning)」2021综述论文

专知会员服务

50+阅读 · 2021年1月31日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知会员服务

90+阅读 · 2020年7月22日