We develop a representation of Gaussian distributed sparsely sampled longitudinal data whereby the data for each subject are mapped to a multivariate Gaussian distribution; this map is entirely data-driven. The proposed method utilizes functional principal component analysis and is nonparametric, assuming no prior knowledge of the covariance or mean structure of the longitudinal data. This approach naturally connects with a deeper investigation of the behavior of the functional principal component scores obtained for longitudinal data, as the number of observations per subject increases from sparse to dense. We show how this is reflected in the shrinkage of the distribution of the conditional scores given noisy longitudinal observations towards a point mass located at the true but unobservable FPCs. Mapping each subject's sparse observations to the corresponding conditional score distribution leads to useful visualizations and representations of sparse longitudinal data. Asymptotic rates of convergence as sample size increases are obtained for the 2-Wasserstein metric between the true and estimated conditional score distributions, both for a $K$-truncated functional principal component representation as well as for the case when $K=K(n)$ diverges with sample size $n\to\infty$. We apply these ideas to construct predictive distributions aimed at predicting outcomes given sparse longitudinal data.
翻译:我们开发了高森分散的分散抽样纵向数据代表, 将每个主题的数据映射成多变量高斯分布; 这张地图完全是数据驱动的。 拟议的方法使用功能性主要成分分析, 并且是非参数性, 假设事先对纵向数据的共差或中值结构没有了解, 假设事先对纵向数据的共差或中值结构没有了解。 这个方法自然与更深入地调查从纵向数据中获得的功能性主要组成部分分数的行为联系起来, 因为每个主题的观测数从稀疏到密度增加。 我们展示了这一点如何反映在条件性分数分布的缩缩缩中, 原因是对位于真实但不可观测的FPCs的点质量进行了激烈的纵向观察。 绘制每个对象对相应条件性分数分布的稀少观察, 导致对微长的长度数据的可视化和表达。 当样本大小增加时, 在真实和估计的分数分布之间, 实际和估计性主要功能分数的比值增加, 两者的比重均表示为$- K= K= 以恒度预测的数值为预测结果。