Recently, self-normalizing neural networks (SNNs) have been proposed with the intention to avoid batch or weight normalization. The key step in SNNs is to properly scale the exponential linear unit (referred to as SELU) to inherently incorporate normalization based on central limit theory. SELU is a monotonically increasing function, where it has an approximately constant negative output for large negative input. In this work, we propose a new activation function to break the monotonicity property of SELU while still preserving the self-normalizing property. Differently from SELU, the new function introduces a bump-shaped function in the region of negative input by regularizing a linear function with a scaled exponential function, which is referred to as a scaled exponentially-regularized linear unit (SERLU). The bump-shaped function has approximately zero response to large negative input while being able to push the output of SERLU towards zero mean statistically. To effectively combat over-fitting, we develop a so-called shift-dropout for SERLU, which includes standard dropout as a special case. Experimental results on MNIST, CIFAR10 and CIFAR100 show that SERLU-based neural networks provide consistently promising results in comparison to other 5 activation functions including ELU, SELU, Swish, Leakly ReLU and ReLU.
翻译:最近,提出了旨在避免批量或重量正常化的自我调节神经网络(SNN),在SNN中,关键步骤是适当缩放指数线性单位(称为SELU),以便根据中央限值理论内在地纳入正常化。SELU是一个单质增加功能,它对于大量负输入具有大约不变的负输出。在这项工作中,我们提议一个新的激活功能,打破SELU的单质特性,同时仍然保留自我调节的财产。不同于SELU,新功能在区域引入了一个负输入的复变功能,即通过一个缩放指数函数使一个线性函数(称为SELU)正规化,这被称为一个缩放的指数性功能(SERLU)。SELU是一个超常的单形功能,它对于大量负输入的反应大约是零,同时能够将SERLU的输出推向零平均值。为了有效地消除过度调节,我们为SERLU开发了所谓的转换式退出,其中包括标准退出的特殊案例。MIT的实验结果、CIFAR10和CIFAR不断的实验结果,包括REL。