Using i.i.d. data to estimate a high-dimensional distribution in Wasserstein distance is a fundamental instance of the curse of dimensionality. We explore how structural knowledge about the data-generating process which gives rise to the distribution can be used to overcome this curse. More precisely, we work with the set of distributions of probabilistic graphical models for a known directed acyclic graph. It turns out that this knowledge is only helpful if it can be quantified, which we formalize via smoothness conditions on the transition kernels in the disintegration corresponding to the graph. In this case, we prove that the rate of estimation is governed by the local structure of the graph, more precisely by dimensions corresponding to single nodes together with their parent nodes. The precise rate depends on the exact notion of smoothness assumed for the kernels, where either weak (Wasserstein-Lipschitz) or strong (bidirectional Total-Variation-Lipschitz) conditions lead to different results. We prove sharpness under the strong condition and show that this condition is satisfied for example for distributions having a positive Lipschitz density.
翻译:暂无翻译