The activation function in neural network introduces the non-linearity required to deal with the complex tasks. Several activation/non-linearity functions are developed for deep learning models. However, most of the existing activation functions suffer due to the dying gradient problem and non-utilization of the large negative input values. In this paper, we propose a Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs) by scaling the Tanh linearly. The proposed LiSHT is non-parametric and tackles the dying gradient problem. We perform the experiments on benchmark datasets of different type, such as vector data, image data and natural language data. We observe the superior performance using Multi-layer Perceptron (MLP), Residual Network (ResNet) and Long-short term memory (LSTM) for data classification, image classification and tweets classification tasks, respectively. The accuracy on CIFAR100 dataset using ResNet model with LiSHT is improved by 9.48, 3.40, 3.16, 4.26, and 1.17\% as compared to Tanh, ReLU, PReLU, LReLU, and Swish, respectively. We also show the qualitative results using loss landscape, weight distribution and activations maps in support of the proposed activation function.
翻译:神经网络的激活功能引入了处理复杂任务所需的非线性功能。 为深层学习模型开发了几种激活/非线性功能。然而,大多数现有的激活功能都因垂死梯度问题和大量负输入值未利用而受到影响。在本文件中,我们分别通过缩放Tanh线性任务,为神经网络提出了线性缩放双曲音调(LiSHT),用LISHT对使用ResNet模型的准确性进行了非参数性分析,并解决了临终梯度问题。我们进行了不同类型基准数据集的实验,如矢量数据、图像数据和自然语言数据。我们用多层 Perceptron(MLP)、残余网络(ResNet)和长肖特术语内存(LSTM),分别用于数据分类、图像分类和推文分类任务。使用LSHT(ResNet)模型的精确性能提高了使用ResNet模型的准确性能,增加了9.48、3.40、3.16、4.26和1.17+(与Tanh、ReL、PRL、Simlistrual 和Simlistal Spriving)分别显示S的Spriving 和Spriving press的Spriving 等质量功能。