Learning generative probabilistic models is a core problem in machine learning, which presents significant challenges due to the curse of dimensionality. This paper proposes a joint dimensionality reduction and non-parametric density estimation framework, using a novel estimator that can explicitly capture the underlying distribution of appropriate reduced-dimension representations of the input data. The idea is to jointly design a nonlinear dimensionality reducing auto-encoder to model the training data in terms of a parsimonious set of latent random variables, and learn a canonical low-rank tensor model of the joint distribution of the latent variables in the Fourier domain. The proposed latent density model is non-parametric and universal, as opposed to the predefined prior that is assumed in variational auto-encoders. Joint optimization of the auto-encoder and the latent density estimator is pursued via a formulation which learns both by minimizing a combination of the negative log-likelihood in the latent domain and the auto-encoder reconstruction loss. We demonstrate that the proposed model achieves very promising results on toy, tabular, and image datasets on regression tasks, sampling, and anomaly detection.
翻译:计算机学习的遗传概率模型是机器学习的一个核心问题,它由于维度的诅咒而构成重大挑战。本文件提出一个联合维度减低和非参数密度估计框架,使用一个新颖的测算器,可以明确捕捉输入数据适当减分解图示的基本分布。设想是联合设计一个非线性维度,减少自动编码器,以模拟培训数据,用一组隐性随机变量来模拟,并学习Fourier域潜在变量联合分布的卡通性低级高压模型。提议的潜伏密度模型是非参数性和普遍性的,而不是在变式自动电算器中预先界定的。通过一种公式对自动编码器和潜在密度估计器进行联合优化,这种公式通过尽量减少隐性域内负逻辑相似性与自动编码重建损失的结合来学习。我们证明,拟议的模型在取样、表格式和图像检测方面取得了非常有希望的结果。