In optimization for Machine learning (ML), it is typical that curvature-matrix (CM) estimates rely on an exponential average (EA) of local estimates (giving EA-CM algorithms). This approach has little principled justification, but is very often used in practice. In this paper, we draw a connection between EA-CM algorithms and what we call a "Wake of Quadratic regularized models". The outlined connection allows us to understand what EA-CM algorithms are doing from an optimization perspective. Generalizing from the established connection, we propose a new family of algorithms, "KL-Divergence Wake-Regularized Models" (KLD-WRM). We give three different practical instantiations of KLD-WRM, and show numerically that these outperform K-FAC on MNIST.
翻译:在优化机器学习(ML)方面,典型的情况是,曲线矩阵(CM)估计依赖于当地估算指数平均数(EA)(提供EA-CM算法 ) 。 这种方法没有多少原则性理由,但在实践中经常使用。 在本文中,我们在EA-CM算法和我们所谓的“四分法正规化模型之 wake of Qualistic Recivilized 模型”之间建立了联系。 所描述的连接使我们能够从优化的角度理解EA-CM算法正在做什么。 从既有的联系中归纳,我们提出了一个新的算法系列,即“KL-Divegence Wake-Regalized 模型 ” ( KLD-WRM ) 。 我们给出了三种不同的 KLD-WRM 实际瞬间, 并用数字显示这些在MMIST上比K-FAC高。