We propose kernel PCA as a method for analyzing the dependence structure of multivariate extremes and demonstrate that it can be a powerful tool for clustering and dimension reduction. Our work provides some theoretical insight into the preimages obtained by kernel PCA, demonstrating that under certain conditions they can effectively identify clusters in the data. We build on these new insights to characterize rigorously the performance of kernel PCA based on an extremal sample, i.e., the angular part of random vectors for which the radius exceeds a large threshold. More specifically, we focus on the asymptotic dependence of multivariate extremes characterized by the angular or spectral measure in extreme value theory and provide a careful analysis in the case where the extremes are generated from a linear factor model. We give theoretical guarantees on the performance of kernel PCA preimages of such extremes by leveraging their asymptotic distribution together with Davis-Kahan perturbation bounds. Our theoretical findings are complemented with numerical experiments illustrating the finite sample performance of our methods.
翻译:我们建议以内核五氯苯甲醚作为分析多变极端依赖性结构的一种方法,并表明它可以成为集中和减少维度的有力工具。我们的工作对内核五氯苯甲醚获得的预感提供了一些理论洞察,表明在某些条件下,它们能够有效地识别数据中的集群。我们利用这些新的洞察,严格地描述内核五氯苯甲醚基于外壳样本的性能,即半径超过大阈值的随机矢量的角部分。更具体地说,我们侧重于以极值理论为特点的以角或光谱测量为特征的多变体极端的无症状依赖性,并对从线性要素模型中生成的极端进行仔细分析。我们从理论上保证以内核五氯苯甲醚的性能,利用它们与Davis-Kahan Perturbation界限一起的无症状分布。我们的理论结论与说明我们方法的有限样本性能的数字实验相辅相成。