Modern longitudinal data, for example from wearable devices, measures biological signals on a fixed set of participants at a diverging number of time points. Traditional statistical methods are not equipped to handle the computational burden of repeatedly analyzing the cumulatively growing dataset each time new data is collected. We propose a new estimation and inference framework for dynamic updating of point estimates and their standard errors across serially collected dependent datasets. The key technique is a decomposition of the extended score function of the quadratic inference function constructed over the cumulative longitudinal data into a sum of summary statistics over data batches. We show how this sum can be recursively updated without the need to access the whole dataset, resulting in a computationally efficient streaming procedure with minimal loss of statistical efficiency. We prove consistency and asymptotic normality of our streaming estimator as the number of data batches diverges, even as the number of independent participants remains fixed. Simulations highlight the advantages of our approach over traditional statistical methods that assume independence between data batches. Finally, we investigate the relationship between physical activity and several diseases through the analysis of accelerometry data from the National Health and Nutrition Examination Survey.
翻译:现代纵向数据,例如来自可磨损装置的现代纵向数据,测量一组固定参与者在不同时间点数点上的生物信号; 传统的统计方法不具备处理每次收集新数据时反复分析累积增加的数据集的计算负担的能力; 我们提出一个新的估计和推论框架,以动态更新点估计数及其在连续收集的附属数据集中的标准错误; 关键技术是将根据累积长距离数据构建的二次推论函数的延长分数分数分数分数分数分数分数分数分数分数分数分数分数分数分数分数汇总成数据组数数。 我们表明,在无需使用整个数据集的情况下,如何反复更新这一总数,从而形成计算效率最低的高效流程序。 我们证明,我们流中估计数据数的连贯性和不那么正常,因为数据组数数不同,即使独立参与者的人数保持不变。 模拟突出表明,我们的方法优于假定数据组数组独立的传统统计方法。 最后,我们通过分析国家营养调查和测量数据,调查物理活动和若干疾病之间的关系。