Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior training ability as quantified by maximum trainable depth. In this work, we explore the effect of saturation of the tanh activation function along the edge of chaos. In particular, we determine the line of uniformity in phase space along which the post-activation distribution has maximum entropy. This line intersects the edge of chaos, and indicates the regime beyond which saturation of the activation function begins to impede training efficiency. Our results suggest that initialization along the edge of chaos is a necessary but not sufficient condition for optimal trainability.
翻译:在沿着混沌边缘初始化的深度前馈网络中,其训练效率通过训练的最大深度进行指标化,此效率呈现指数级的提升。本研究探讨了沿着混沌边缘饱和的 tahn 激活函数在深度学习中的影响。具体而言,我们确定了相空间中的均一线,沿该线后激活分布具有最大熵。该线与混沌边缘相交,并指示了超出该边界时,激活函数饱和会开始阻碍训练效率。我们的研究结果表明,混沌边缘初始化是实现最优训练效率的必要条件,但不是足够的条件。