Surrogate gradient (SG) training provides the possibility to quickly transfer all the gains made in deep learning to neuromorphic computing and neuromorphic processors, with the consequent reduction in energy consumption. Evidence supports that training can be robust to the choice of SG shape, after an extensive search of hyper-parameters. However, random or grid search of hyper-parameters becomes exponentially unfeasible as we consider more hyper-parameters. Moreover, every point in the search can itself be highly time and energy consuming for large networks and large datasets. In this article we show how complex tasks and networks are more sensitive to SG choice. Secondly, we show how low dampening, high sharpness and low tail fatness are preferred. Thirdly, we observe that Glorot Uniform initialization is generally preferred by most SG choices, with variability in the results. We finally provide a theoretical solution to reduce the need of extensive gridsearch, to find SG shape and initializations that result in improved accuracy.
翻译:代用梯度(SG)培训提供了将深层学习中取得的所有成果迅速转移给神经形态计算和神经形态处理器的可能性,从而导致能源消耗减少。有证据表明,经过对超参数进行广泛搜索后,培训对于选择SG形状来说是强有力的。然而,随着我们考虑更多的超参数,对超参数的随机或网格搜索变得极不可行。此外,搜索的每一点本身都可能耗费大量的时间和能源,对大型网络和大型数据集来说都是如此。在本条中,我们展示了如何复杂的任务和网络对SG的选择更加敏感。第二,我们显示了如何倾向于低压、高锐度和低尾部脂肪。第三,我们观察到,大多数SG的选择通常倾向于Glorot统一初始化,结果也各不相同。我们最终提供了理论解决方案,以减少对大网络和大型数据集进行广泛的网格搜索的需要,从而找到能够提高准确度的SG形状和初始化。