Principal Component Analysis (PCA) is a popular method for dimension reduction and has attracted an unfailing interest for decades. More recently, kernel PCA (KPCA) has emerged as an extension of PCA but, despite its use in practice, a sound theoretical understanding of KPCA is missing. We contribute several lower and upper bounds on the efficiency of KPCA, involving the empirical eigenvalues of the kernel Gram matrix and new quantities involving a notion of variance. These bounds show how much information is captured by KPCA on average and contribute a better theoretical understanding of its efficiency. We demonstrate that fast convergence rates are achievable for a widely used class of kernels and we highlight the importance of some desirable properties of datasets to ensure KPCA efficiency.
翻译:主要成分分析(PCA)是减少尺寸的流行方法,几十年来一直引起人们的关注,最近,内部五氯苯甲醚(KPCA)作为五氯苯甲醚的延伸出现,但实际上却缺乏对金伯利管理局的正确理论理解,我们对金伯利管理局的效率提出了几条较低和较高的界限,涉及内核格拉姆矩阵的经验性精华值和涉及差异概念的新数量,这些界限表明金伯利管理局平均收集了多少信息,有助于从理论上更好地了解其效率,我们表明,对于广泛使用的一类核心而言,快速趋同率是可以实现的,我们强调,某些适当数据集的特性对于确保金伯利管理局的效率十分重要。