双向自热神经网络 (Bidirectionally Self-Normalizing Neural Networks)

The problem of vanishing and exploding gradients has been a long-standing obstacle that hinders the effective training of neural networks. Despite various tricks and techniques that have been employed to alleviate the problem in practice, there still lacks satisfactory theories or provable solutions. In this paper, we address the problem from the perspective of high-dimensional probability theory. We provide a rigorous result that shows, under mild conditions, how the vanishing/exploding gradients problem disappears with high probability if the neural networks have sufficient width. Our main idea is to constrain both forward and backward signal propagation in a nonlinear neural network through a new class of activation functions, namely Gaussian-Poincar\'e normalized functions, and orthogonal weight matrices. Experiments on both synthetic and real-world data validate our theory and confirm its effectiveness on very deep neural networks when applied in practice.

翻译：消失和爆炸的梯度问题一直是阻碍神经网络有效培训的长期障碍。尽管在实践中运用了各种技巧和技术来缓解这一问题,但仍然缺乏令人满意的理论或可验证的解决办法。在本文件中,我们从高维概率理论的角度来解决这个问题。我们提供了严格的结果,表明在温和的条件下,如果神经网络有足够的宽度,消失/爆炸的梯度问题会如何以很高的概率消失。我们的主要想法是通过新型的激活功能,即高斯-波因卡尔的正常功能和圆形重量矩阵,限制在非线性神经网络中前向和后向信号传播。合成数据和现实世界数据的实验证实了我们的理论,并在实际应用时证实了它在非常深的神经网络上的有效性。

相关内容

Neural Networks

关注 1648

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

训练深度学习模型，46页ppt

专知会员服务

54+阅读 · 2021年6月17日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日