Statistical distances (SDs), which quantify the dissimilarity between probability distributions, are central to machine learning and statistics. A modern method for estimating such distances from data relies on parametrizing a variational form by a neural network (NN) and optimizing it. These estimators are abundantly used in practice, but corresponding performance guarantees are partial and call for further exploration. In particular, there seems to be a fundamental tradeoff between the two sources of error involved: approximation and estimation. While the former needs the NN class to be rich and expressive, the latter relies on controlling complexity. This paper explores this tradeoff by means of non-asymptotic error bounds, focusing on three popular choices of SDs -- Kullback-Leibler divergence, chi-squared divergence, and squared Hellinger distance. Our analysis relies on non-asymptotic function approximation theorems and tools from empirical process theory. Numerical results validating the theory are also provided.
翻译:计算概率分布差异的统计距离(SDs)是机器学习和统计的关键。估算数据距离的现代方法依赖于神经网络(NN)的变异形式和优化数据。这些估计器在实践中使用得很多,但相应的性能保障是局部的,需要进一步探索。特别是,两种错误来源(近似和估计)之间似乎存在着根本性的权衡。前者需要NN类丰富和表达,而后者则依赖于控制复杂性。本文通过非非同步错误界限来探讨这种权衡,重点是SDs的三个流行选择 -- -- Kullback-leiber差异、奇异差和正方格的Hellinger距离。我们的分析还依靠非随机函数将理论和工具与实验过程理论相近。还提供了验证理论的数值结果。