Non-negative matrix factorization (NMF) minimizes the Euclidean distance between the data matrix and its low rank approximation, and it fails when applied to corrupted data because the loss function is sensitive to outliers. In this paper, we propose a Truncated CauchyNMF loss that handle outliers by truncating large errors, and develop a Truncated CauchyNMF to robustly learn the subspace on noisy datasets contaminated by outliers. We theoretically analyze the robustness of Truncated CauchyNMF comparing with the competing models and theoretically prove that Truncated CauchyNMF has a generalization bound which converges at a rate of order $O(\sqrt{{\ln n}/{n}})$, where $n$ is the sample size. We evaluate Truncated CauchyNMF by image clustering on both simulated and real datasets. The experimental results on the datasets containing gross corruptions validate the effectiveness and robustness of Truncated CauchyNMF for learning robust subspaces.
翻译:非负矩阵分解(NMF)通过最小化数据矩阵与其低秩近似之间的欧几里得距离来学习子空间,但由于其损失函数对异常值敏感,在处理受污染数据时效果不佳。本文提出了一种截断柯西NMF损失函数,通过截断较大误差项以应对异常值,并在此基础上发展了截断柯西NMF方法,用于在受异常值干扰的噪声数据集上鲁棒地学习子空间。我们从理论上分析了截断柯西NMF相较于其他竞争模型的鲁棒性,并严格证明了其泛化界以$O(\sqrt{{\ln n}/{n}})$的速率收敛,其中$n$为样本量。通过在模拟和真实数据集上进行图像聚类实验,我们在包含严重污染的数据集上验证了截断柯西NMF在学习鲁棒子空间方面的有效性与稳健性。