Matrix completion has attracted attention in many fields, including statistics, applied mathematics, and electrical engineering. Most of the works focus on the independent sampling models under which the observed entries are sampled independently. Motivated by applications in the integration of knowledge graphs derived from multi-source biomedical data such as those from Electronic Health Records (EHR) and biomedical text, we propose the {\bf B}lock-wise {\bf O}verlapping {\bf N}oisy {\bf M}atrix {\bf I}ntegration (BONMI) to treat blockwise missingness of symmetric matrices representing relatedness between entity pairs. Our idea is to exploit the orthogonal Procrustes problem to align the eigenspace of the two sub-matrices, then complete the missing blocks by the inner product of the two low-rank components. Besides, we prove the statistical rate for the eigenspace of the underlying matrix, which is comparable to the rate under the independently missing assumption. Simulation studies show that the method performs well under a variety of configurations. In the real data analysis, the method is applied to two tasks: (i) the integrating of several point-wise mutual information matrices built by English EHR and Chinese medical text data, and (ii) the machine translation between English and Chinese medical concepts. Our method shows an advantage over existing methods.
翻译:在许多领域,包括统计、应用数学和电气工程领域,矩阵的完成吸引了人们的注意。大部分工作侧重于独立抽样模型,根据这些模型对观察到的条目进行独立抽样。我们的想法是利用从电子健康记录(EHR)和生物医学文本等多源生物医学数据产生的知识图集集集集应用电子健康记录(EHR)和生物医学文本产生的知识图集,我们建议使用 {bfB}lock-wise {bf witch-with {bf O}verplashing {bouroisy {bf atrixy {bf I}contrigation(BONMI) 来处理代表实体对对口关系的各种相异的对称矩阵缺失。我们的想法是利用两个子体的正方位质质质质矩阵问题来调整两个子体系的机体空间,然后用两个低级部件的内产体完成缺失的块块。此外,我们证明了基础矩阵的统计率,这与独立缺失假设下的速率相当。模拟研究表明,该方法在多种组合下运行方法都很好。在多种组合下。在中国医学概念中。在实际数据分析中,一种机器中,一种数据转换中,一种方法是用于两种方法,用两种方法。在英语和一种方法对正文的翻译。