Flatten-T Swish:用于深层学习的适应性非线性活动功能 (Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning)

Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in a performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN- 5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.

翻译：激活功能是深层学习的一个关键组成部分,在投入和产出之间进行非线性绘图。校正线性单位(RELU)是深层学习界最受欢迎的激活功能。然而,RELU包含若干缺点,可能导致深神经网络培训效率低下,这些缺点是:1) ReLU的负面取消属性往往将负面投入视为学习的重要信息,导致业绩退化;2 ReLU固有的预先界定性质不大可能促进网络的进一步灵活性、直观性和稳健性;3) ReLU的启用平均值非常正,导致网络层的偏移效应;4 ReLU的多线性结构限制了这些网络的非线性近似能力。为了克服这些缺点,本文采用PFTS(PTS)作为学习的替代标准。将ReLU作为基线方法,实验表明SVHN数据设置的精确度提高了0.31%、0.989%、2.6%、2.6%至3.3%的DNFS-NF培训方法在D-NF、0.9%、0.9%、2.6%的DNF-NF、1.35%的S、0.9%、0.9%和0.9%的SNFTF培训方法中提高了。