大规模或在线生存数据可缩放的估算和推断 (Scalable Estimation and Inference with Large-scale or Online Survival Data)

With the rapid development of data collection and aggregation technologies in many scientific disciplines, it is becoming increasingly ubiquitous to conduct large-scale or online regression to analyze real-world data and unveil real-world evidence. In such applications, it is often numerically challenging or sometimes infeasible to store the entire dataset in memory. Consequently, classical batch-based estimation methods that involve the entire dataset are less attractive or no longer applicable. Instead, recursive estimation methods such as stochastic gradient descent that process data points sequentially are more appealing, exhibiting both numerical convenience and memory efficiency. In this paper, for scalable estimation of large or online survival data, we propose a stochastic gradient descent method which recursively updates the estimates in an online manner as data points arrive sequentially in streams. Theoretical results such as asymptotic normality and estimation efficiency are established to justify its validity. Furthermore, to quantify the uncertainty associated with the proposed stochastic gradient descent estimator and facilitate statistical inference, we develop a scalable resampling strategy that specifically caters to the large-scale or online setting. Simulation studies and a real data application are also provided to assess its performance and illustrate its practical utility.

翻译：随着许多科学学科中数据收集和汇总技术的迅速发展,进行大规模或在线回归以分析真实世界数据并公布真实世界证据的现象正在变得越来越普遍。在这些应用中,将整个数据集储存在记忆中往往具有数字挑战性或有时不可行。因此,涉及整个数据集的典型的批量估算方法不那么具有吸引力或不再适用。相反,循环估算方法,例如按顺序处理数据点的随机梯度梯度下移法更具吸引力,显示出数字便利性和记忆效率。在本文中,为了对大型或在线生存数据进行可缩放的估计,我们提出一种随机梯度梯度下移法,随着数据点相继到达流,以在线方式反复更新估计数。理论结果,如正常性和估计效率等,可以证明其有效性。此外,为了量化与拟议的梯度梯度梯度梯度下位估计器相关的不确定性,并便利统计推断,我们制定了一种可缩放的战略,具体针对大规模或在线生存数据,我们提出了一种可缩放的梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度方法。我们还介绍了一种实际实用性模型,还介绍了大规模或在线数据的实用性数据性数据应用,用以说明其实用性数据性研究及实际应用。模拟性数据模拟性数据模拟性研究与实际性研究及实际实用性研究及实际性研究,并演示度研究及实际应用。模拟性数据模拟性评估。模拟性数据模拟性数据。模拟性数据。模拟性研究。模拟性研究及实际性学度研究与实际性研究与实际性能度研究与实际性能。模拟性评估。模拟性研究与实际性研究与实际效果。模拟性研究与实际性研究与实际性研究与实际性研究与实际性研究与实际性评估。模拟性研究与实际性研究与实际性研究与实际性研究与实际性评估。模拟性研究与实际性研究与实际性研究与实际性研究与实际性数据。模拟性研究与实际性评估。