Modern longitudinal data from wearable devices consist of biological signals at high-frequency time points. Distributed statistical methods have emerged as a powerful tool to overcome the computational burden of estimation and inference with large data, but methodology for distributed functional regression remains limited. We propose a distributed estimation and inference procedure that efficiently estimates both functional and scalar parameters with intensively measured longitudinal outcomes. The procedure overcomes computational difficulties through a scalable divide-and-conquer algorithm that partitions the outcomes into smaller sets. We circumvent traditional basis selection problems by analyzing data using quadratic inference functions in smaller subsets such that the basis functions have a low dimension. To address the challenges of combining estimates from dependent subsets, we propose a statistically efficient one-step estimator derived from a constrained generalized method of moments objective function with a smoothing penalty. We show theoretically and numerically that the proposed estimator is as statistically efficient as non-distributed alternative approaches and more efficient computationally. We demonstrate the practicality of our approach with the analysis of accelerometer data from the National Health and Nutrition Examination Survey.
翻译:从损耗装置获得的现代纵向数据包括高频时间点的生物信号。分布式统计方法已成为克服计算估计负担和根据大数据推断的强大工具,但分布式功能回归的方法仍然有限。我们建议采用分布式估计和推论程序,有效估计功能参数和标度参数,并大力测量纵向结果。该程序通过一种可缩放的分化算法,将结果分成小数组,克服了计算困难。我们绕过传统的基础选择问题,方法是利用小子群中的二次推论函数分析数据,使基础函数具有低维度。为了应对从依赖子群中合并估计数的挑战,我们建议采用统计效率高的单步测算法,从有限的时钟客观功能通用方法中得出一个统计效率的单步测算法,并处以平滑的罚款。我们从理论上和数字上表明,拟议的估测算法在统计上与非分散的替代方法和更高效的计算方法一样有效。我们展示了我们的方法在分析国家健康和营养调查的加速度计数据时的实用性。