We propose a system for calculating a "scaling constant" for layers and weights of neural networks. We relate this scaling constant to two important quantities that relate to the optimizability of neural networks, and argue that a network that is "preconditioned" via scaling, in the sense that all weights have the same scaling constant, will be easier to train. This scaling calculus results in a number of consequences, among them the fact that the geometric mean of the fan-in and fan-out, rather than the fan-in, fan-out, or arithmetic mean, should be used for the initialization of the variance of weights in a neural network. Our system allows for the off-line design & engineering of ReLU neural networks, potentially replacing blind experimentation.
翻译:我们提出一个计算神经网络层和重量的“缩放常数”系统。我们将这一缩放常数与与神经网络的优化性相关的两个重要数量联系起来,并主张一个通过缩放“附加条件”的网络,即所有重量都有相同的缩放常数,将更容易培训。这种缩放微积分导致若干后果,其中包括扇门和扇门外的几何平均值,而不是扇门外、扇门外或算术平均值,应用于神经网络重量差异的初始化。 我们的系统允许对ReLU神经网络进行离线设计和工程,并有可能取代盲人实验。