Originally, tangles had been invented as an abstract tool in mathematical graph theory to prove the famous graph minor theorem. In this paper we showcase the practical potential of tangles in machine learning applications. Given a collection of weak cuts (bipartitions) of any dataset, tangles provide a mechanism to aggregate these cuts such that they point in the direction of a dense structure. As a result, a cluster is softly characterized by a set of consistent pointers. This highly flexible approach can solve soft clustering problems in a variety of setups, ranging from questionnaires over community detection in graphs to clustering points in metric spaces. The output of our proposed framework is of a hierarchical nature and induces the notion of a soft dendrogram, which can be helpful to explore the cluster structure of a dataset. The computational complexity of aggregating the cuts is only linear in the number of data points and places the bottleneck on generating the cuts, for which simple and fast algorithms form a sufficient basis. The contribution of this paper is to translate the abstract mathematical literature into algorithms that can be used in machine learning. We demonstrate the power of tangles in many different use cases and provide theoretical guarantees for their performance.
翻译:最初, 三角形被发明为数学图理中的抽象工具, 以证明著名的图形小理论。 在本文中, 我们展示了机器学习应用程序中三角形的实际潜力。 由于收集了任何数据集的微减( 分割区), 三角形提供了一种机制来汇总这些削减, 从而指向稠密结构的方向。 结果, 集群的特征是软化的, 由一组一致的指针组成。 这种高度灵活的方法可以解决各种设置中的软组合问题, 从图表中的社区探测问卷到指标空间中的群集点。 我们拟议框架的输出具有等级性质, 并引出软的刻度图概念, 有助于探索数据集的组合结构。 将切割的计算复杂度只线性在数据点数中, 并将瓶颈放在产生切分上, 而简单和快速的算法构成充分的基础。 本文的贡献是将抽象的数学文献翻译为可被用于机器学习的算法。 我们展示了它们在许多不同案例中使用的理论性保证力。