多级专题学习的深核心主要构成部分分析 (Deep Kernel Principal Component Analysis for Multi-level Feature Learning)

Principal Component Analysis (PCA) and its nonlinear extension Kernel PCA (KPCA) are widely used across science and industry for data analysis and dimensionality reduction. Modern deep learning tools have achieved great empirical success, but a framework for deep principal component analysis is still lacking. Here we develop a deep kernel PCA methodology (DKPCA) to extract multiple levels of the most informative components of the data. Our scheme can effectively identify new hierarchical variables, called deep principal components, capturing the main characteristics of high-dimensional data through a simple and interpretable numerical optimization. We couple the principal components of multiple KPCA levels, theoretically showing that DKPCA creates both forward and backward dependency across levels, which has not been explored in kernel methods and yet is crucial to extract more informative features. Various experimental evaluations on multiple data types show that DKPCA finds more efficient and disentangled representations with higher explained variance in fewer principal components, compared to the shallow KPCA. We demonstrate that our method allows for effective hierarchical data exploration, with the ability to separate the key generative factors of the input data both for large datasets and when few training samples are available. Overall, DKPCA can facilitate the extraction of useful patterns from high-dimensional data by learning more informative features organized in different levels, giving diversified aspects to explore the variation factors in the data, while maintaining a simple mathematical formulation.

翻译：现代深层学习工具取得了巨大的实证成功,但是仍然缺乏深层主要组成部分分析框架。在这里,我们开发了一种深层的五氯苯甲醚内核方法(DKPCA),以提取数据中信息最丰富的组成部分的多种层面。我们的计划可以有效地确定新的等级变量,称为深层主要组成部分,通过简单和可解释的数字优化来捕捉高层次数据的主要特征。我们把多层次的《金伯利协定》的主要组成部分结合起来,理论上表明《金伯利协定》在不同层次上产生了前向和后向依赖性,而这种依赖性尚未在内核方法中加以探讨,但对于提取更多信息特性也至关重要。关于多种数据类型的各种实验性评估表明,《金伯利协定》发现,与浅层次的《金伯利金伯利协定》相比,主要组成部分差异较大,其解释性较高。我们证明,我们的方法允许有效地进行分级数据勘探,并能够将输入数据的关键基因分解因素分开,从大层数据集和很少的培训样品中产生前向后产生,而这种依赖性依赖性对获取更多信息特性特性特性特性特征至关重要。总的来说,通过高层次的提取数据,从高层次数据,从高层次数据到高层次数据结构的深度研究,可以促进数据结构变化。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日