Subspace clustering is an important unsupervised clustering approach. It is based on the assumption that the high-dimensional data points are approximately distributed around several low-dimensional linear subspaces. The majority of the prominent subspace clustering algorithms rely on the representation of the data points as linear combinations of other data points, which is known as a self-expressive representation. To overcome the restrictive linearity assumption, numerous nonlinear approaches were proposed to extend successful subspace clustering approaches to data on a union of nonlinear manifolds. In this comparative study, we provide a comprehensive overview of nonlinear subspace clustering approaches proposed in the last decade. We introduce a new taxonomy to classify the state-of-the-art approaches into three categories, namely locality preserving, kernel based, and neural network based. The major representative algorithms within each category are extensively compared on carefully designed synthetic and real-world data sets. The detailed analysis of these approaches unfolds potential research directions and unsolved challenges in this field.
翻译:子空间集群是一种重要的、不受监督的集群方法,其依据的假设是,高维数据点大致分布在几个低维线性子空间上。大多数突出的子空间集群算法都以数据点作为其他数据点线性组合的表示方式作为数据点的表示方式,而其他数据点被称为自我表达式表达方式。为了克服限制性的线性假设,提出了许多非线性方法,将成功的子空间集群方法推广到非线性多元体联合数据中。在这项比较研究中,我们全面概述了过去十年提出的非线性子空间集群方法。我们采用了一种新的分类法,将最先进的方法分为三类,即地点保护、内核基和以神经网络为基础。每个类别的主要代表性算法都广泛比较了精心设计的合成和实际世界数据集。对这些方法的详细分析揭示了该领域的潜在研究方向和未解决的挑战。