Deep neural networks often suffer from poor performance or even training failure due to the ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point problem. In this paper, a novel method by acting the gradient activation function (GAF) on the gradient is proposed to handle these challenges. Intuitively, the GAF enlarges the tiny gradients and restricts the large gradient. Theoretically, this paper gives conditions that the GAF needs to meet, and on this basis, proves that the GAF alleviates the problems mentioned above. In addition, this paper proves that the convergence rate of SGD with the GAF is faster than that without the GAF under some assumptions. Furthermore, experiments on CIFAR, ImageNet, and PASCAL visual object classes confirm the GAF's effectiveness. The experimental results also demonstrate that the proposed method is able to be adopted in various deep neural networks to improve their performance. The source code is publicly available at https://github.com/LongJin-lab/Activated-Gradients-for-Deep-Neural-Networks.
翻译:深神经网络往往由于条件不良的问题、消失/爆炸的梯度问题和支撑点问题,而表现不佳,甚至培训失败。在本文中,提议采用一种新方法,在梯度上运用梯度激活功能(GAF)来应付这些挑战。自然,GAF扩大微小梯度,限制大梯度。从理论上讲,本文件给出了GAF需要满足的条件,并在此基础上证明GAF缓解了上述问题。此外,本文件证明SGD与GAF的趋同率比没有GAF在某些假设下的速度要快。此外,在CIFAR、图像网络和PASCAL视觉对象类上进行的实验证实了GAF的功效。实验结果还表明,各种深层神经网络都能够采用拟议的方法来改进其性能。源代码可在https://github.com/LongJin-lab/Adivated-GArients-for-Deep-Neural-Networks上公开查阅。