We systematically study the spectrum of kernel-based graph Laplacian (GL) constructed from high-dimensional and noisy random point cloud in the nonnull setup. The problem is motived by studying the model when the clean signal is sampled from a manifold that is embedded in a low-dimensional Euclidean subspace, and corrupted by high-dimensional noise. We quantify how the signal and noise interact over different regions of signal-to-noise ratio (SNR), and report the resulting peculiar spectral behavior of GL. In addition, we explore the impact of chosen kernel bandwidth on the spectrum of GL over different regions of SNR, which lead to an adaptive choice of kernel bandwidth that coincides with the common practice in real data. This result paves the way to a theoretical understanding of how practitioners apply GL when the dataset is noisy.
翻译:我们系统地研究以内核为基础的图解 Laplacian (GL) 的频谱, 由无核构造中的高维和吵闹随机点云组成。 问题的根源是,当清洁信号从低维的Euclidean子空间内嵌的元体取样,并被高维噪音腐蚀时, 研究该模型。 我们量化信号和噪音在不同信号对噪音比率区域之间的相互作用, 并报告GL 的特殊光谱行为 。 此外, 我们探索了所选的内核带宽对不同区域GL 频谱的影响, 这导致对内核带宽的适应性选择, 这与真实数据中的常见做法相吻合 。 这为理论理解从业者如何在数据集变响时应用 GL 铺平了道路 。