Given a set of data points belonging to the convex hull of a set of vertices, a key problem in data analysis and machine learning is to estimate these vertices in the presence of noise. Many algorithms have been developed under the assumption that there is at least one nearby data point to each vertex; two of the most widely used ones are vertex component analysis (VCA) and the successive projection algorithm (SPA). This assumption is known as the pure-pixel assumption in blind hyperspectral unmixing, and as the separability assumption in nonnegative matrix factorization. More recently, Bhattacharyya and Kannan (ACM-SIAM Symposium on Discrete Algorithms, 2020) proposed an algorithm for learning a latent simplex (ALLS) that relies on the assumption that there is more than one nearby data point for each vertex. In that scenario, ALLS is probalistically more robust to noise than algorithms based on the separability assumption. In this paper, inspired by ALLS, we propose smoothed VCA (SVCA) and smoothed SPA (SSPA) that generalize VCA and SPA by assuming the presence of several nearby data points to each vertex. We illustrate the effectiveness of SVCA and SSPA over VCA, SPA and ALLS on synthetic data sets, and on the unmixing of hyperspectral images.
翻译:鉴于一组脊椎的混凝土构成的一组数据点,数据分析和机器学习的一个关键问题是,在出现噪音的情况下估计这些脊椎。许多算法是在以下假设下制定的:每个脊椎至少有一个近处数据点;两个最广泛使用的数据点是脊椎部分分析(VCA)和连续投影算法(SPA)。这一假设被称为盲高光谱分光分光混和非偏差矩阵因子化的纯像素假设。最近,Bhattacharya和Kannan(ACM-SIAM关于低度高度高温的专题讨论会,2020年)提出一个算法,以学习每个脊椎至少一个近处数据点;两种最广泛使用的数据点是脊椎部分分析(VCAA)有一个以上的近处数据点。在这个假设中,根据分光度假设,ANSA对噪音比算法的假设更生力强。在本文的启发下,我们提议在VCA(S)和SA(SA(SA)的每一个SA、SA(SA)和S(SAA级)的S(SA级总和SAVCAVA(SA)和S(SA级)的SAVA(SAVA(SA)的SA)和S(S(S(S)的S(S)和S(S(S)))的S)的S(S(S)的SAblbl)数据定位)的每个数据定位)和S(S(SA(S)的正常)的正常)的每个)的每个数据(SA(S(S(S)和)的每个)的每个)的不等)的正常数据(S(SA)的不)和)中)的每个)的每个的每个)中的数据(S(S(SA(SA)和)的每个)的每个)的每个)中,以的不的每个)和(SA(SA(SA(SA)和CA(SA(SA(SA(SA)的每个)的每个)的每个)的)的每个)数据(SA(SA(SA(SA(SA)的))的不等数据(SA)和/)