无界目标函数随机优化的集中不等式及其在去噪分数匹配中的应用 (Concentration Inequalities for Stochastic Optimization of Unbounded Objective Functions with Application to Denoising Score Matching)

We derive novel concentration inequalities that bound the statistical error for a large class of stochastic optimization problems, focusing on the case of unbounded objective functions. Our derivations utilize the following key tools: 1) A new form of McDiarmid's inequality that is based on sample-dependent one-component mean-difference bounds and which leads to a novel uniform law of large numbers result for unbounded functions. 2) A new Rademacher complexity bound for families of functions that satisfy an appropriate sample-dependent Lipschitz property, which allows for application to a large class of distributions with unbounded support. As an application of these results, we derive statistical error bounds for denoising score matching (DSM), an application that inherently requires one to consider unbounded objective functions and distributions with unbounded support, even in cases where the data distribution has bounded support. In addition, our results quantify the benefit of sample-reuse in algorithms that employ easily-sampled auxiliary random variables in addition to the training data, e.g., as in DSM, which uses auxiliary Gaussian random variables.

翻译：本文推导了一类新颖的集中不等式，用于界定一大类随机优化问题的统计误差，重点关注目标函数无界的情形。我们的推导运用了以下关键工具：1）一种基于样本依赖性单分量均值差界的新型McDiarmid不等式，该不等式导出了无界函数的新型一致大数定律结果；2）针对满足适当样本依赖性Lipschitz性质的函数族，提出了一种新的Rademacher复杂度界，这使得该方法能够适用于一大类具有无界支撑集的分布。作为这些结果的应用，我们推导了去噪分数匹配（DSM）的统计误差界——该应用本质上要求考虑无界目标函数及无界支撑集的分布，即使在数据分布具有有界支撑集的情况下亦如此。此外，我们的结果量化了在训练数据之外还采用易采样辅助随机变量的算法（例如DSM中使用的辅助高斯随机变量）中样本重用的优势。