Graph clustering is a fundamental problem in unsupervised learning, with numerous applications in computer science and in analysing real-world data. In many real-world applications, we find that the clusters have a significant high-level structure. This is often overlooked in the design and analysis of graph clustering algorithms which make strong simplifying assumptions about the structure of the graph. This thesis addresses the natural question of whether the structure of clusters can be learned efficiently and describes four new algorithmic results for learning such structure in graphs and hypergraphs. All of the presented theoretical results are extensively evaluated on both synthetic and real-word datasets of different domains, including image classification and segmentation, migration networks, co-authorship networks, and natural language processing. These experimental results demonstrate that the newly developed algorithms are practical, effective, and immediately applicable for learning the structure of clusters in real-world data.
翻译:图形群集是未经监督的学习中的一个基本问题,在计算机科学和分析真实世界数据方面有许多应用。在许多现实世界应用中,我们发现这些群集具有重要的高层次结构。在设计和分析图组群算法时,往往忽略了这一点,这些算法对图形结构的简单化假设力很强。这个论文探讨的是集群群的结构能否有效学习的自然问题,并描述了在图表和高压图中学习这种结构的四种新的算法结果。所有介绍的理论结果都在不同领域的合成和真实语言数据集上进行了广泛评估,包括图像分类和分解、移徙网络、共同作者网络和自然语言处理。这些实验结果表明,新开发的算法是实用的、有效的,并且可立即用于学习真实世界数据中的群集结构。