We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional manifold, is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise, which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct a thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly.
翻译:本文研究了深度生成模型的似然方法在非参数估计奇异分布方面的统计性质。更具体地,我们使用深度生成模型来建模假设围绕某些低维结构集中的高维数据。估计支持于这个低维结构上的分布(例如一个低维流形),由于其在环境空间的勒贝格测度中的奇异性,这是一项具有挑战性的任务。在所考虑的模型中,由于奇异性,通常的似然方法可能无法保证一致地估计目标分布。我们证明了一种新颖且有效的解决方案,即通过实例噪声来扰动数据,这将导致对底层分布的一致估计和期望的收敛速率。我们还描述了可以通过深度生成模型高效地估计的分布类。这个类足够一般,可以包含各种结构化分布,例如乘积分布、经典的光滑分布和支持于低维流形上的分布。我们的分析提供了有关深度生成模型如何避免非参数分布估计的维度诅咒的一些见解。我们进行了彻底的模拟研究和实际数据分析,以经验性证明所提出的数据扰动技术显著改善了估计性能。