This work considers the problem of fitting functional models with sparsely and irregularly sampled functional data. It overcomes the limitations of the state-of-the-art methods, which face major challenges in the fitting of more complex non-linear models. Currently, many of these models cannot be consistently estimated unless the number of observed points per curve grows sufficiently quickly with the sample size, whereas, we show numerically that a modified approach with more modern multiple imputation methods can produce better estimates in general. We also propose a new imputation approach that combines the ideas of {\it MissForest} with {\it Local Linear Forest} and compare their performance with {\it PACE} and several other multivariate multiple imputation methods. This work is motivated by a longitudinal study on smoking cessation, in which the Electronic Health Records (EHR) from Penn State PaTH to Health allow for the collection of a great deal of data, with highly variable sampling. To illustrate our approach, we explore the relation between relapse and diastolic blood pressure. We also consider a variety of simulation schemes with varying levels of sparsity to validate our methods.
翻译:这项工作考虑了功能模型与很少和不定期抽样的功能数据相匹配的问题,克服了最先进方法的局限性,这些方法在安装更复杂的非线性模型方面面临着重大挑战。目前,许多这些模型无法始终如一地估算,除非每个曲线的观察点数随着抽样规模而迅速增长,而我们从数字上表明,采用更现代的多种估算方法的修改方法可以产生更好的总体估计。我们还提出了一种新的估算方法,将“Forest ” 和“Lit Lilarar Forest ” 的概念结合起来,并把它们的性能与“PACE ” 和若干其他多变性多重估算方法进行比较。这项工作的动机是,对戒烟问题进行纵向研究,在这项研究中,来自宾州帕特至卫生的电子健康记录(EHR) 能够收集大量数据,并具有高度变量的抽样。为了说明我们的方法,我们探索了复发和直截血压之间的关系。我们还考虑了各种模拟方案,其规模各不相同,用以验证我们的方法。