We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure such as a low-dimensional manifold is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly.
翻译:更具体地说,我们用深基因模型来调查对单体分布进行非参数估计的概率方法的统计属性; 更具体地说,使用深基因模型来模拟高维数据,假定这些数据将集中在某些低维结构周围的低维结构上; 估计这种低维结构上所支持的分布由于在环境空间里对Lebesgue测量值的独一性而具有挑战性而具有挑战性; 在所考虑的模型中,通常的可能性方法可能无法对目标分布值作出一致的估计; 我们证明存在一种新颖而有效的解决办法,即以实例噪音对数据进行扰动,从而导致对基本分布值进行一致的估算,从而导致以理想的趋同率对基本分布值进行一致的估算; 我们还对通过深基因化模型可以有效估算的分布类别进行定性; 这一类别十分笼统,足以包含各种结构分布,如产品分布值、典型的顺利分布值和低维体模型上所支持的分布值; 我们的分析提供了一些关于深度基因描述模型如何避免非参数分布估计的极限的诅咒。 我们进行了彻底的模拟研究和实际数据分析,以实验性地证明,以实验性地表明,如何改进了对数据进行估计。