Principal Component Analysis (PCA) and its nonlinear extension Kernel PCA (KPCA) are widely used across science and industry for data analysis and dimensionality reduction. Modern deep learning tools have achieved great empirical success, but a framework for deep principal component analysis is still lacking. Here we develop a deep kernel PCA methodology (DKPCA) to extract multiple levels of the most informative components of the data. Our scheme can effectively identify new hierarchical variables, called deep principal components, capturing the main characteristics of high-dimensional data through a simple and interpretable numerical optimization. We couple the principal components of multiple KPCA levels, theoretically showing that DKPCA creates both forward and backward dependency across levels, which has not been explored in kernel methods and yet is crucial to extract more informative features. Various experimental evaluations on multiple data types show that DKPCA finds more efficient and disentangled representations with higher explained variance in fewer principal components, compared to the shallow KPCA. We demonstrate that our method allows for effective hierarchical data exploration, with the ability to separate the key generative factors of the input data both for large datasets and when few training samples are available. Overall, DKPCA can facilitate the extraction of useful patterns from high-dimensional data by learning more informative features organized in different levels, giving diversified aspects to explore the variation factors in the data, while maintaining a simple mathematical formulation.
翻译:现代深层学习工具取得了巨大的实证成功,但是仍然缺乏深层主要组成部分分析框架。在这里,我们开发了一种深层的五氯苯甲醚内核方法(DKPCA),以提取数据中信息最丰富的组成部分的多种层面。我们的计划可以有效地确定新的等级变量,称为深层主要组成部分,通过简单和可解释的数字优化来捕捉高层次数据的主要特征。我们把多层次的《金伯利协定》的主要组成部分结合起来,理论上表明《金伯利协定》在不同层次上产生了前向和后向依赖性,而这种依赖性尚未在内核方法中加以探讨,但对于提取更多信息特性也至关重要。关于多种数据类型的各种实验性评估表明,《金伯利协定》发现,与浅层次的《金伯利金伯利协定》相比,主要组成部分差异较大,其解释性较高。我们证明,我们的方法允许有效地进行分级数据勘探,并能够将输入数据的关键基因分解因素分开,从大层数据集和很少的培训样品中产生前向后产生,而这种依赖性依赖性对获取更多信息特性特性特性特性特征至关重要。总的来说,通过高层次的提取数据,从高层次数据,从高层次数据到高层次数据结构的深度研究,可以促进数据结构变化。