Originally, tangles were invented as an abstract tool in mathematical graph theory to prove the famous graph minor theorem. In this paper, we showcase the practical potential of tangles in machine learning applications. Given a collection of cuts of any dataset, tangles aggregate these cuts to point in the direction of a dense structure. As a result, a cluster is softly characterized by a set of consistent pointers. This highly flexible approach can solve clustering problems in various setups, ranging from questionnaires over community detection in graphs to clustering points in metric spaces. The output of our proposed framework is hierarchical and induces the notion of a soft dendrogram, which can help explore the cluster structure of a dataset. The computational complexity of aggregating the cuts is linear in the number of data points. Thus the bottleneck of the tangle approach is to generate the cuts, for which simple and fast algorithms form a sufficient basis. In our paper we construct the algorithmic framework for clustering with tangles, prove theoretical guarantees in various settings, and provide extensive simulations and use cases. Python code is available on github.
翻译:最初, 三角形被发明为数学图形理论中的抽象工具, 以证明著名的图形小理论。 在本文中, 我们展示了机器学习应用程序中三角形的实际潜力。 在收集了任何数据集的切分后, 三角形将这些切分集中到密度结构的方向上。 因此, 集群以一组一致的指针进行软化的特征描述。 这种高度灵活的方法可以解决各种设置中的组合问题, 从图表中的社区探测问卷到多空间的组合点。 我们拟议框架的输出是等级化的, 并引出软三角形的概念, 它可以帮助探索数据集的群集结构。 集结的计算复杂性是数据点数的线性。 因此, 矩形方法的瓶颈是生成切分, 其简单和快速的算法构成充分的基础。 在我们的文件中, 我们构建了以三角形组合为主的算法框架, 证明各种环境的理论保证, 并提供广泛的模拟和使用案例。 Python 代码可以在 github 上找到 。