Finding a suitable data representation for a specific task has been shown to be crucial in many applications. The success of subspace clustering depends on the assumption that the data can be separated into different subspaces. However, this simple assumption does not always hold since the raw data might not be separable into subspaces. To recover the ``clustering-friendly'' representation and facilitate the subsequent clustering, we propose a graph filtering approach by which a smooth representation is achieved. Specifically, it injects graph similarity into data features by applying a low-pass filter to extract useful data representations for clustering. Extensive experiments on image and document clustering datasets demonstrate that our method improves upon state-of-the-art subspace clustering techniques. Especially, its comparable performance with deep learning methods emphasizes the effectiveness of the simple graph filtering scheme for many real-world applications. An ablation study shows that graph filtering can remove noise, preserve structure in the image, and increase the separability of classes.
翻译:在许多应用中,找到适合特定任务的数据表示方式已被证明在许多应用中至关重要。 子空间分组的成功取决于数据可以分为不同子空间的假设。 但是,这一简单假设并不总是站得住脚, 因为原始数据可能无法分离到子空间。 为了恢复“ 组合友好” 代表方式并促进随后的分组, 我们提议了一个图形过滤方法, 从而实现平稳的表达方式。 具体地说, 它通过应用低通道过滤器将图形相似性注入数据特征中, 以提取有用的数据表示方式进行分组。 对图像和文档分组数据集的广泛实验表明, 我们的方法改进了最先进的子空间分组技术。 特别是, 它与深层学习方法的可比性能强调了简单图形过滤方法对许多真实世界应用的有效性。 一项丑化研究显示, 图形过滤方法可以消除噪音, 保存图像的结构, 并增加分类的可变性 。