Statistical divergences (SDs), which quantify the dissimilarity between probability distributions, are a basic constituent of statistical inference and machine learning. A modern method for estimating those divergences relies on parametrizing an empirical variational form by a neural network (NN) and optimizing over parameter space. Such neural estimators are abundantly used in practice, but corresponding performance guarantees are partial and call for further exploration. We establish non-asymptotic absolute error bounds for a neural estimator realized by a shallow NN, focusing on four popular $\mathsf{f}$-divergences -- Kullback-Leibler, chi-squared, squared Hellinger, and total variation. Our analysis relies on non-asymptotic function approximation theorems and tools from empirical process theory to bound the two sources of error involved: function approximation and empirical estimation. The bounds characterize the effective error in terms of NN size and the number of samples, and reveal scaling rates that ensure consistency. For compactly supported distributions, we further show that neural estimators of the first three divergences above with appropriate NN growth-rate are minimax rate-optimal, achieving the parametric convergence rate.
翻译:量化概率分布差异的统计差异(SDs)是统计推论和机器学习的基本组成部分。估算这些差异的现代方法依赖于神经网络(NN)的实验变异形式和对参数空间的优化。这些神经估计器在实践中使用得非常多,但相应的性能保障是局部的,需要进一步探索。我们为浅度NNC所实现的神经估计仪设定非无症状绝对误差界限,重点是四种流行的美元-mathsf{f}$-diverences -- -- Kullback-Leiper、ch-quared、quad Hellinger和总体变异。我们的分析依赖于非无症状函数的参数和工具,将实验过程理论中的理论和工具相近,以约束所涉及的两个错误来源:功能近似和实证估计。我们为NNF大小和样本数量方面有效误差的界限,并显示确保一致性的缩放率。对于最贴实的分布,我们进一步显示,NFA-MI(NG)级增长率与前三个偏差率,先是达到NISM(NA)M)的最小趋一致。