This paper tackles the problem of robust covariance matrix estimation when the data is incomplete. Classical statistical estimation methodologies are usually built upon the Gaussian assumption, whereas existing robust estimation ones assume unstructured signal models. The former can be inaccurate in real-world data sets in which heterogeneity causes heavy-tail distributions, while the latter does not profit from the usual low-rank structure of the signal. Taking advantage of both worlds, a covariance matrix estimation procedure is designed on a robust (mixture of scaled Gaussian) low-rank model by leveraging the observed-data likelihood function within an expectation-maximization algorithm. It is also designed to handle general pattern of missing values. The proposed procedure is first validated on simulated data sets. Then, its interest for classification and clustering applications is assessed on two real data sets with missing values, which include multispectral and hyperspectral time series.
翻译:本文处理数据不完整时稳健的共变矩阵估算问题。典型统计估算方法通常以高斯假设为基础,而现有的强势估算方法则采用非结构化信号模型。前者在现实世界数据集中可能不准确,在现实世界数据集中,异质性会导致重尾分布,而后者则不能从通常的信号低位结构中获益。利用这两个世界,利用一个强势(缩放高斯的混合)低位模型设计了共变矩阵估算程序,在预期-混合算法中利用观察到的数据概率功能。该方法还旨在处理缺失值的一般模式。拟议的程序首先在模拟数据集上验证。然后,对分类和集群应用的兴趣进行评估,在两个缺少值的真实数据集中,其中包括多光谱和超光谱时间序列。