Clustering based on the random walk operator has been proven effective for undirected graphs, but its generalization to directed graphs (digraphs) is much more challenging. Although the random walk operator is well-defined for digraphs, in most cases such graphs are not strongly connected, and hence the associated random walks are not irreducible, which is a crucial property for clustering that exists naturally in the undirected setting. To remedy this, the usual workaround is to either naively symmetrize the adjacency matrix or to replace the natural random walk operator by the teleporting random walk operator, but this can lead to the loss of valuable information carried by edge directionality. In this paper, we introduce a new clustering framework, the Parametrized Random Walk Diffusion Kernel Clustering (P-RWDKC), which is suitable for handling both directed and undirected graphs. Our framework is based on the diffusion geometry and the generalized spectral clustering framework. Accordingly, we propose an algorithm that automatically reveals the cluster structure at a given scale, by considering the random walk dynamics associated with a parametrized kernel operator, and by estimating its critical diffusion time. Experiments on $K$-NN graphs constructed from real-world datasets and real-world graphs show that our clustering approach performs well in all tested cases, and outperforms existing approaches in most of them.
翻译:以随机行走操作器为基础进行分组的做法已证明对未定向图表有效,但通常的变通办法是对相邻矩阵进行天真的对称,或由远程随机行走操作器取代自然随机行走操作器,其挑战性要大得多。尽管随机行走操作器对测算仪作了明确的界定,但这种图表大多没有紧密连接,因此相关的随机行走并不是不可避免的,因此,随机行走是非定向环境中自然存在的一组的关键属性。为了纠正这一点,通常的变通办法是对相近矩阵进行天真的对称,或由远程随机行行走操作器取代自然随机行行走操作器,但这样做可能导致边缘方向性所传播的宝贵信息丢失。在本文中,我们引入了一个新的集群框架,即Parmetricized随机行行走式漫步心血管组合(P-RWDKC),它适合处理定向和无定向环境中自然存在的图表。我们的框架基于传播的几何测量和广度光谱光谱集方法。因此,我们建议一种算法,通过考虑在最精确的时空行走动态的模型中与最精确的模型模拟的模型模型模型模型模型模型中,通过模拟的模型模拟的模型模拟,通过模拟的模型模拟的模型来显示,从真实的模型模拟的模型模拟的模型和模拟的模拟的模型来自动地展示所有模型的模型的模型的模型的模拟的模型,自动显示。