New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance density and geometry in the underlying data, we clarify their key parameters and discuss how they may be chosen in practice. Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semi-supervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results require only that the underlying data is sampled from a low-dimensional manifold, and depend crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.
翻译:演示了对电量加权最短路径(PWSPDs)的新的几何和计算分析。 通过在基础数据中展示这些计量平衡密度和几何学的方法,我们澄清了关键参数,并讨论了如何在实践中选择这些参数。比较了相关的数据驱动度,这些参数显示了密度在以内核为基础的、不受监督和半监督的机器学习中的更广泛作用。计算了,我们把完整加权图形中的PWSPDs与加权最近的近邻图形中的模拟相连接,提供了近于最佳的等值的高度概率保障。我们开发了与精确理论的联系,以确定有限样本环境中PWSPDs偏差和差异的估计数。理论结果得到了说明性实验的支持,展示了PWSPDs在广泛数据设置中的多功能性。在整个文件中,我们的结果只需要从一个低维元中提取基本数据,并关键地依赖于该元的内在维度,而不是其周围维度。