We present a comparison between various algorithms of inference of covariance and precision matrices in small datasets of real vectors, of the typical length and dimension of human brain activity time series retrieved by functional Magnetic Resonance Imaging (fMRI). Assuming a Gaussian model underlying the neural activity, the problem consists in denoising the empirically observed matrices in order to obtain a better estimator of the true precision and covariance matrices. We consider several standard noise-cleaning algorithms and compare them on two types of datasets. The first type are time series of fMRI brain activity of human subjects at rest. The second type are synthetic time series sampled from a generative Gaussian model of which we can vary the fraction of dimensions per sample q = N/T and the strength of off-diagonal correlations. The reliability of each algorithm is assessed in terms of test-set likelihood and, in the case of synthetic data, of the distance from the true precision matrix. We observe that the so called Optimal Rotationally Invariant Estimator, based on Random Matrix Theory, leads to a significantly lower distance from the true precision matrix in synthetic data, and higher test likelihood in natural fMRI data. We propose a variant of the Optimal Rotationally Invariant Estimator in which one of its parameters is optimised by cross-validation. In the severe undersampling regime (large q) typical of fMRI series, it outperforms all the other estimators. We furthermore propose a simple algorithm based on an iterative likelihood gradient ascent, providing an accurate estimation for weakly correlated datasets.
翻译:我们比较了在真实矢量的小型数据集中的共变和精密矩阵的各种算法,这是由功能磁共振成像(fMRI)检索的人类大脑活动时间序列的典型长度和尺寸。假设神经活动背后的高斯模型,问题在于对经验观测的矩阵进行拆分,以便获得对真实精确度和共变矩阵的更好的估计。我们考虑了若干标准的噪声清除算法,并在两种数据集中进行比较。第一种类型是休息时的FMRI人类主体大脑活动的时间序列。第二种类型是合成时间序列,取自一个基因化的戈斯模型,其中我们可以改变每个样本的尺寸的分数 q = N/ T 和 离异性相关性的强度。每种算法的可靠性是通过测试概率的可能性来评估的,在合成数据中,从真实精度矩阵中,我们观察到一个叫做最优化的罗氏质变异性模型,从一个远的精确度模型中提出一个更精确的精确性模型。