Using a mean-field theory of signal propagation, we analyze the evolution of correlations between two signals propagating through a ReLU network with correlated weights. Signals become highly correlated in deep ReLU networks with uncorrelated weights. We show that ReLU networks with anti-correlated weights can avoid this fate and have a chaotic phase where the correlations saturate below unity. Consistent with this analysis, we find that networks initialized with anti-correlated weights can train faster (in a teacher-student setting) by taking advantage of the increased expressivity in the chaotic phase. Combining this with a previously proposed strategy of using an asymmetric initialization to reduce dead ReLU probability, we propose an initialization scheme that allows faster training and learning than the best-known methods.
翻译:使用信号传播的暗地理论, 我们分析两个信号之间的关联性演变, 两个信号通过ReLU 网络传播, 并具有相关重量。 信号在深ReLU 网络中变得高度相关, 有不相关重量。 我们显示, 具有抗碳相关重量的ReLU 网络可以避免这一命运, 并且有一个混乱的阶段, 其相关性饱和于统一之下。 根据这一分析, 我们发现, 使用抗碳相关重量初始化的网络( 在师生环境中 ) 可以利用混乱阶段中日益增强的表达性来加快培训速度( 在教师- 学生环境下 ) 。 将信号与先前提出的使用不对称初始化战略相结合, 以减少死亡的ReLU 概率, 我们提议了一个初始化计划, 使得与最著名的方法相比, 更快的培训和学习速度。