Effective non-parametric density estimation is a key challenge in high-dimensional multivariate data analysis. In this paper,we propose a novel approach that builds upon tensor factorization tools. Any multivariate density can be represented by its characteristic function, via the Fourier transform. If the sought density is compactly supported, then its characteristic function can be approximated, within controllable error, by a finite tensor of leading Fourier coefficients, whose size de-pends on the smoothness of the underlying density. This tensor can be naturally estimated from observed realizations of the random vector of interest, via sample averaging. In order to circumvent the curse of dimensionality, we introduce a low-rank model of this characteristic tensor, which significantly improves the density estimate especially for high-dimensional data and/or in the sample-starved regime. By virtue of uniqueness of low-rank tensor decomposition, under certain conditions, our method enables learning the true data-generating distribution. We demonstrate the very promising performance of the proposed method using several measured datasets.
翻译:有效的非参数密度估计是高维多变量数据分析中的一项关键挑战。 在本文中, 我们提出一种以强因子化工具为基础的新办法。 任何多变密度都可以通过 Fourier 变换以其特性函数表示。 如果所寻求的密度得到精细支持, 那么其特性功能可以在可控制的误差范围内, 以一小串主要Fourier 系数为近似值, 其大小不取决于深层密度的平滑性。 这个振幅可以自然地从观察到的对感兴趣的随机矢量的实现中进行估计, 通过平均采样来进行。 为了绕过对维度的诅咒, 我们引入了这一特性的低级模型, 以显著提高密度估计, 特别是对于高维数据和/ 或抽样星系而言。 在某些条件下, 由于低位 Exor 分位特性的独特性, 我们的方法能够了解真正的数据生成分布。 我们用几种测量的数据集展示了拟议方法非常有希望的性。