Given a set of data points belonging to the convex hull of a set of vertices, a key problem in linear algebra, signal processing, data analysis and machine learning is to estimate these vertices in the presence of noise. Many algorithms have been developed under the assumption that there is at least one nearby data point to each vertex; two of the most widely used ones are vertex component analysis (VCA) and the successive projection algorithm (SPA). This assumption is known as the pure-pixel assumption in blind hyperspectral unmixing, and as the separability assumption in nonnegative matrix factorization. More recently, Bhattacharyya and Kannan (ACM-SIAM Symposium on Discrete Algorithms, 2020) proposed an algorithm for learning a latent simplex (ALLS) that relies on the assumption that there is more than one nearby data point to each vertex. In that scenario, ALLS is probalistically more robust to noise than algorithms based on the separability assumption. In this paper, inspired by ALLS, we propose smoothed VCA (SVCA) and smoothed SPA (SSPA) that generalize VCA and SPA by assuming the presence of several nearby data points to each vertex. We illustrate the effectiveness of SVCA and SSPA over VCA, SPA and ALLS on synthetic data sets, on the unmixing of hyperspectral images, and on feature extraction on facial images data sets. In addition, our study highlights new theoretical results for VCA.
翻译:给定属于顶点凸包的一组数据点,线性代数、信号处理、数据分析和机器学习中的一个关键问题是在存在噪声的情况下估计这些顶点。许多算法都是在假设每个顶点至少有一个相邻数据点的情况下开发的;其中最广泛使用的两个算法是顶点分量分析(Vertex Component Analysis, VCA)和逐步投影算法(Successive Projections Algorithm, SPA)。在盲目高光谱解混合中,这个假设被称为纯像素假设,在非负矩阵分解中则被称为可分离假设。最近,Bhattacharyya和Kannan(ACM-SIAM Discrete Algorithms会议,2020)提出了一种学习潜在单纯形(ALLS)的算法,该算法依赖于每个顶点有多个相邻数据点的假设。在这种情况下,ALLS在噪声中具有更强的鲁棒性,比基于可分离假设的算法更具优势。本文受ALLS启发,提出了平滑VCA(Smoothed VCA, SVCA)和平滑SPA(Smoothed SPA, SSPA),通过假设每个顶点存在多个相邻数据点来推广VCA和SPA。我们在合成数据集、高光谱图像分解和人脸图像数据集的特征提取中展示了SVCA和SSPA的有效性,与VCA、SPA和ALLS进行了对比。此外,我们的研究还为VCA提供了新的理论结果。