Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in high-dimensional statistical testing. In this paper, we investigate the role of $L^2$ regularization in training a neural network Stein critic so as to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. Making a connection to the Neural Tangent Kernel (NTK) theory, we develop a novel staging procedure for the weight of regularization over training time, which leverages the advantages of highly-regularized training at early times. Theoretically, we prove the approximation of the training dynamic by the kernel optimization, namely the ``lazy training'', when the $L^2$ regularization weight is large, and training on $n$ samples converge at a rate of ${O}(n^{-1/2})$ up to a log factor. The result guarantees learning the optimal critic assuming sufficient alignment with the leading eigen-modes of the zero-time NTK. The benefit of the staged $L^2$ regularization is demonstrated on simulated high dimensional data and an application to evaluating generative models of image data.
翻译:学习区分模型分布和观测数据的基本问题是统计学和机器学习中的一个重要问题,而高维数据仍是这些问题的一个具有挑战性的应用场景。度量概率分布差异的指标,例如斯坦距离,在高维统计检验中有着重要的作用。本文研究了$L^2$正则化在训练神经网络斯坦评论家中的作用,以便区分从未知概率分布和名义模型分布中采样的数据。通过与神经切向核(NTK)理论建立联系,我们开发了一种新的权重分阶段的规则化训练过程,利用早期高度规则化的优势。从理论上讲,我们证明了训练动态的近似性,即通过核优化学习“懒惰训练”,当$L^2$正则化权重较大时,对$n$个样本的训练收敛速度为${O}(n^{-1/2})$,直到对数因子。该结果保证了在零时间NTK的主要本征模式上与充分对齐的情况下,学习到最优评论家。分阶段$L^2$正则化的好处在模拟高维数据和评估图像数据的生成模型应用中得到了证明。