In this work, we propose a data-driven scheme to initialize the parameters of a deep neural network. This is in contrast to traditional approaches which randomly initialize parameters by sampling from transformed standard distributions. Such methods do not use the training data to produce a more informed initialization. Our method uses a sequential layer-wise approach where each layer is initialized using its input activations. The initialization is cast as an optimization problem where we minimize a combination of encoding and decoding losses of the input activations, which is further constrained by a user-defined latent code. The optimization problem is then restructured into the well-known Sylvester equation, which has fast and efficient gradient-free solutions. Our data-driven method achieves a boost in performance compared to random initialization methods, both before start of training and after training is over. We show that our proposed method is especially effective in few-shot and fine-tuning settings. We conclude this paper with analyses on time complexity and the effect of different latent codes on the recognition performance.
翻译:在这项工作中,我们提出了一个数据驱动的初始化计划,以启动深神经网络的参数。这与传统方法形成对照,这些传统方法通过从经过转变的标准分布中抽样随机初始化参数。这些方法不使用培训数据来生成更知情的初始化。我们的方法使用顺序层次方法,每个层使用输入启动程序初始化。初始化是一个优化问题,我们在此将输入启动的编码和解码损失结合起来,而输入启动的编码和解码损失被用户定义的潜在代码进一步制约。优化问题随后被重组为众所周知的Sylvester方程式,该方程式具有快速高效的梯度无溶液。我们的数据驱动方法在培训开始之前和培训结束后,与随机初始化方法相比都取得了进步。我们表明我们所提议的方法在微小和微调环境中特别有效。我们通过分析时间复杂性和不同潜在代码对识别性能的影响来完成本文件。