The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks. It elucidates insightful theories and useful tools to understand and optimize neural networks. Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. We examine two popular estimators whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in neural networks for regression and classification. We navigate trade-offs for both estimators based on analytical and numerical studies. We find that the variance quantities depend on the non-linearity wrt different parameter groups and should not be neglected when estimating the Fisher information.
翻译:暂无翻译