Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in statistical testing in high dimensions. In this paper, we investigate the role of $L^2$ regularization in training a neural network Stein critic so as to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. Motivated by the Neural Tangent Kernel (NTK) theory, we develop a novel staging procedure for the weight of regularization over training time. This leverages the advantages of highly-regularized training at early times while also empirically delaying overfitting. Theoretically, we prove the approximation of the training dynamic by the kernel optimization, namely the ``lazy training'', when the $L^2$ regularization weight is large. The result provides a guarantee of learning the optimal critic assuming sufficient alignment with the leading eigen-modes of the zero-time NTK. The benefit of the staged $L^2$ regularization is demonstrated on simulated high dimensional distribution drift data and an application to evaluating generative models of image data.
翻译:在统计和机器学习中,将模型分布与观察到的数据区别开来是一个根本问题,而高维数据仍然是这些问题的一个挑战性环境。量化概率分布差异的尺度,如斯坦差异,在高层面统计测试中起着重要作用。在本文中,我们调查了在培训神经网络中,2美元的正规化作用,以区分抽样数据与未知概率分布和名义模型分布之间的区别。在Neural Tangent Kernel(NTK)理论的激励下,我们为培训时间的正规化权重开发了一个新的启动程序。这利用了早期高度常规化培训的优势,同时也在经验上拖延了过度的调整。理论上,我们证明了核心优化(即“拉齐培训”)对培训动态的近度作用,在2美元的正规化权重很大的情况下,将这些数据区分为“拉齐”值。结果保证学习最佳的批评者,假设与零时NTK的主要乙型模式充分一致。在模拟高维数据流流数据应用中,演示了以美元为典型的正规化模型的效益。