K-FAC is a successful tractable implementation of Natural Gradient for Deep Learning, which nevertheless suffers from the requirement to compute the inverse of the Kronecker factors (through an eigen-decomposition). This can be very time-consuming (or even prohibitive) when these factors are large. In this paper, we theoretically show that, owing to the exponential-average construction paradigm of the Kronecker factors that is typically used, their eigen-spectrum must decay. We show numerically that in practice this decay is very rapid, leading to the idea that we could save substantial computation by only focusing on the first few eigen-modes when inverting the Kronecker-factors. Randomized Numerical Linear Algebra provides us with the necessary tools to do so. Numerical results show we obtain $\approx2.5\times$ reduction in per-epoch time and $\approx3.3\times$ reduction in time to target accuracy. We compare our proposed K-FAC sped-up versions with a more computationally efficient NG implementation, SENG, and observe we perform on par with it.
翻译:K- FAC 是一个成功的“ 深层学习自然梯度” 的成功执行, 但它仍然受制于对克伦克尔因素进行反向计算的要求( 通过 eigen 分解) 。 当这些因素巨大时, 这可能非常耗时( 甚至令人望而却步 ) 。 在本文中, 我们理论上显示, 由于通常使用的克伦克尔因素的指数平均构建模式, 他们的脑分光必须衰减。 我们从数字上显示, 实际上这种衰减非常迅速, 导致一种想法, 即当Kronecker- factors 反转时, 我们只关注最初的几种机率模型, 就可以节省大量计算。 随机化的Numical Algebra 为我们提供了这样做的必要工具 。 数值结果显示, 由于通常使用的Kronecker因素的指数平均构建模式, 他们的脑分光谱必须衰减1美元, 和 $\ approx3.3 y times 在时间上降低目标精确度。 我们比较了我们提议的K- FAC 加速计算方法的精度, 与更具有计算效率的GNG 。