Deep neural networks are usually initialized with random weights, with adequately selected initial variance to ensure stable signal propagation during training. However, selecting the appropriate variance becomes challenging especially as the number of layers grows. In this work, we replace random weight initialization with a fully deterministic initialization scheme, viz., ZerO, which initializes the weights of networks with only zeros and ones (up to a normalization factor), based on identity and Hadamard transforms. Through both theoretical and empirical studies, we demonstrate that ZerO is able to train networks without damaging their expressivity. Applying ZerO on ResNet achieves state-of-the-art performance on various datasets, including ImageNet, which suggests random weights may be unnecessary for network initialization. In addition, ZerO has many benefits, such as training ultra deep networks (without batch-normalization), exhibiting low-rank learning trajectories that result in low-rank and sparse solutions, and improving training reproducibility.
翻译:深神经网络通常以随机权重初始化, 并有充分选择的初始差异, 以确保培训期间稳定的信号传播。 但是, 选择适当的差异会变得很困难, 特别是随着层数的增加。 在这项工作中, 我们用一个完全确定性的初始化计划, 即 ZerO 取代随机权重初始化计划, 即 ZerO, 它根据身份和 Hadamard 变异, 将网络的权重初始化为只有零和零( 直至正常化系数 ) 。 我们通过理论和实证研究, 证明 ZerO 能够培训网络, 而不会损害其表达性。 应用 ZerO 在 ResNet 上应用 ZerO 在各种数据集, 包括图像网络上达到最先进的性能, 这表明网络初始化可能不需要随机权重 。 此外, ZerO 有许多好处, 例如培训超深网络( 没有批次标准化 ), 展示低级学习轨迹, 导致低级和稀少的解决方案, 以及提高培训再生能力。