We systematically {study the spectrum} of kernel-based graph Laplacian (GL) constructed from high-dimensional and noisy random point cloud in the nonnull setup, where the point cloud is sampled from a low-dimensional geometric object, like a manifold, and corrupted by high-dimensional noise. We quantify how the signal and noise interact over different regimes of signal-to-noise ratio (SNR), and report {the resulting peculiar spectral behavior} of GL. In addition, we explore the choice of kernel bandwidth on the spectrum of GL over different regimes of SNR, which leads to an adaptive choice of bandwidth that coincides with the common practice in real data. This result provides a theoretical support for what practitioner do when the dataset is noisy.
翻译:我们系统{研究基于内核的图解 Laplacian (GL) 的频谱。 我们从高维和吵闹的随机点云中建起, 点云从一个低维的几何对象, 像一个多元, 被高维噪音腐蚀。 我们量化信号和噪音如何在 GL 的不同信号和噪音比( SNR ) 系统中相互作用, 并报告 { 由此产生的特殊光谱行为} 。 此外, 我们还探索了GL 频谱对 SNR 不同系统的内核带宽的选择, 从而导致对带宽的适应性选择, 这与真实数据的常见做法相吻合 。 这个结果提供了理论支持, 当数据组噪音时, 执业者会做什么 。