The notion of local intrinsic dimensionality (LID) is an important advancement in data dimensionality analysis, with applications in data mining, machine learning and similarity search problems. Existing distance-based LID estimators were designed for tabular datasets encompassing data points represented as vectors in a Euclidean space. After discussing their limitations for graph-structured data considering graph embeddings and graph distances, we propose NC-LID, a novel LID-related measure for quantifying the discriminatory power of the shortest-path distance with respect to natural communities of nodes as their intrinsic localities. It is shown how this measure can be used to design LID-aware graph embedding algorithms by formulating two LID-elastic variants of node2vec with personalized hyperparameters that are adjusted according to NC-LID values. Our empirical analysis of NC-LID on a large number of real-world graphs shows that this measure is able to point to nodes with high link reconstruction errors in node2vec embeddings better than node centrality metrics. The experimental evaluation also shows that the proposed LID-elastic node2vec extensions improve node2vec by better preserving graph structure in generated embeddings.
翻译:本地内在维度概念(LID)是数据维度分析的一个重要进步,在数据挖掘、机器学习和类似搜索问题方面的应用方面,是数据维度分析的一个重要进步。现有基于远程的LID测算器是为表列数据集设计的,这些数据集包括以爱立度空间中的矢量为代表的数据点。在讨论了图形结构数据对图形嵌入和图形距离的局限性之后,我们提议了NC-LID,这是一个与LID相关的新颖措施,用于量化最短路径距离相对于作为自然地点的节点社区的自然社区而言具有的差别性能。它表明如何使用这一措施设计LID-觉测图嵌入算法,方法是用根据NC-LID值调整的个化超分量计来制作两个无线点的LID-弹性变体。我们对大量真实世界图的实验性分析表明,这一计量仪能够指向在新点2点嵌入的顶点中与高链接的重建错误,比NOC中心度指标更好。实验性评估还表明,通过保存图层结构的更好改进了LID-dedealdestration2的扩展。