Clustering algorithms remain valuable tools for grouping and summarizing the most important aspects of data. Example areas where this is the case include image segmentation, dimension reduction, signals analysis, model order reduction, numerical analysis, and others. As a consequence, many clustering approaches have been developed to satisfy the unique needs of each particular field. In this article, we present a family of data-adaptive partitioning algorithms that unifies several well-known methods (e.g., k-means and k-subspaces). Indexed by a single parameter and employing a common minimization strategy, the algorithms are easy to use and interpret, and scale well to large, high-dimensional problems. In addition, we develop an adaptive mechanism that (a) exhibits skill at automatically uncovering data structures and problem parameters without any expert knowledge and, (b) can be used to augment other existing methods. By demonstrating the performance of our methods on examples from disparate fields including subspace clustering, model order reduction, and matrix approximation, we hope to highlight their versatility and potential for extending the boundaries of existing scientific domains. We believe our family's parametrized structure represents a synergism of algorithms that will foster new developments and directions, not least within the data science community.
翻译:暂无翻译