Dimensionality reduction (DR) algorithms compress high-dimensional data into a lower dimensional representation while preserving important features of the data. DR is a critical step in many analysis pipelines as it enables visualisation, noise reduction and efficient downstream processing of the data. In this work, we introduce the ProbDR variational framework, which interprets a wide range of classical DR algorithms as probabilistic inference algorithms in this framework. ProbDR encompasses PCA, CMDS, LLE, LE, MVU, diffusion maps, kPCA, Isomap, (t-)SNE, and UMAP. In our framework, a low-dimensional latent variable is used to construct a covariance, precision, or a graph Laplacian matrix, which can be used as part of a generative model for the data. Inference is done by optimizing an evidence lower bound. We demonstrate the internal consistency of our framework and show that it enables the use of probabilistic programming languages (PPLs) for DR. Additionally, we illustrate that the framework facilitates reasoning about unseen data and argue that our generative models approximate Gaussian processes (GPs) on manifolds. By providing a unified view of DR, our framework facilitates communication, reasoning about uncertainties, model composition, and extensions, particularly when domain knowledge is present.
翻译:降维算法是将高维数据压缩为低维表示的一种方法,同时保留数据的重要特征。降维在许多分析流程中是至关重要的,它使数据的展示、噪声减少和有效的下游处理成为可能。在本文中,我们介绍了ProbDR变分框架,将广泛的经典降维算法解释为该框架中的概率推断算法。ProbDR包括PCA、CMDS、LLE、LE、MVU、扩散映射、kPCA、Isomap、(t-)SNE和UMAP。在我们的框架中,使用低维潜在变量构建协方差、精度或图拉普拉斯矩阵,它们可用作数据的生成模型的一部分。通过优化证据下界进行推断。我们展示了我们框架的内部一致性,并证明它使得能够使用概率编程语言(PPLs)进行降维。此外,我们证明了该框架有利于推理出未见过的数据,并认为我们生成的模型近似于流形上的高斯过程(GPs)。通过提供降维的统一视图,我们的框架促进了沟通、推理不确定性、模型组合和扩展,特别是当领域知识存在时。