In clinical practice and biomedical research, measurements are often collected sparsely and irregularly in time while the data acquisition is expensive and inconvenient. Examples include measurements of spine bone mineral density, cancer growth through mammography or biopsy, a progression of defective vision, or assessment of gait in patients with neurological disorders. Since the data collection is often costly and inconvenient, estimation of progression from sparse observations is of great interest for practitioners. From the statistical standpoint, such data is often analyzed in the context of a mixed-effect model where time is treated as both a fixed-effect (population progression curve) and a random-effect (individual variability). Alternatively, researchers analyze Gaussian processes or functional data where observations are assumed to be drawn from a certain distribution of processes. These models are flexible but rely on probabilistic assumptions, require very careful implementation, specific to the given problem, and tend to be slow in practice. In this study, we propose an alternative elementary framework for analyzing longitudinal data, relying on matrix completion. Our method yields estimates of progression curves by iterative application of the Singular Value Decomposition. Our framework covers multivariate longitudinal data, regression, and can be easily extended to other settings. As it relies on existing tools for matrix algebra it is efficient and easy to implement. We apply our methods to understand trends of progression of motor impairment in children with Cerebral Palsy. Our model approximates individual progression curves and explains 30% of the variability. Low-rank representation of progression trends enables identification of different progression trends in subtypes of Cerebral Palsy.
翻译:在临床实践和生物医学研究中,在数据获取费用昂贵和不便的情况下,往往很少和不定期地及时收集测量数据,例如脊椎骨矿物密度的测量、通过乳房造影或生物检查的癌症增长、有缺陷的视力的演进或神经神经紊乱患者的行进评估。由于数据收集往往费用高且不便,从稀少的观察得出的进展估计对于从业者来说具有极大的兴趣。从统计角度看,这些数据往往在混合效应模型的背景下分析,将时间视为固定效应(人口递进曲线)和随机效应(个人变异)。或者,研究人员分析高斯进程或功能数据,假设从某种过程分布中得出观察结果。这些模型具有灵活性,但依赖概率假设,需要非常谨慎地执行,具体针对特定问题,而且在实践中往往缓慢。在研究中,我们提出了一个分析长度数据的替代基本框架,根据矩阵的完成情况,通过迭代应用Singal Val Decomposi定位,得出递进曲线的曲线估计值。我们的框架涵盖多动性、倒退、倒退和递进度趋势,我们可以很容易地将我们运动的进度推向到其他趋势。