Covariance matrix estimation is an important task in the analysis of multivariate data in disparate scientific fields, including neuroscience, genomics, and astronomy. However, modern scientific data are often incomplete due to factors beyond the control of researchers, and data missingness may prohibit the use of traditional covariance estimation methods. Some existing methods address this problem by completing the data matrix, or by filling the missing entries of an incomplete sample covariance matrix by assuming a low-rank structure. We propose a novel approach that exploits auxiliary variables to complete covariance matrix estimates. An example of auxiliary variable is the distance between neurons, which is usually inversely related to the strength of neuronal correlation. Our method extracts auxiliary information via regression, and involves a single tuning parameter that can be selected empirically. We compare our method with other matrix completion approaches via simulations, and apply it to the analysis of large-scale neuroscience data.
翻译:暂无翻译