A frequent problem in statistical science is how to properly handle missing data in matched paired observations. There is a large body of literature coping with the univariate case. Yet, the ongoing technological progress in measuring biological systems raises the need for addressing more complex data, e.g., graphs, strings and probability distributions, among others. In order to fill this gap, this paper proposes new estimators of the maximum mean discrepancy (MMD) to handle complex matched pairs with missing data. These estimators can detect differences in data distributions under different missingness mechanisms. The validity of this approach is proven and further studied in an extensive simulation study, and results of statistical consistency are provided. Data from continuous glucose monitoring in a longitudinal population-based diabetes study are used to illustrate the application of this approach. By employing the new distributional representations together with cluster analysis, new clinical criteria on how glucose changes vary at the distributional level over five years can be explored.
翻译:统计科学的一个常见问题是,如何在对齐观测中适当处理缺失的数据; 有大量的文献处理单体体体案例; 然而,测量生物系统方面的持续技术进步使人们更有必要处理更复杂的数据,例如图表、字符串和概率分布等; 为了填补这一空白,本文件提议了最大平均差异的新估计器,用于处理与数据缺失相配的复杂相配数据; 这些估计器可以发现不同缺失机制下的数据分布差异; 这种方法的有效性在广泛的模拟研究中得到证明和进一步研究,并提供了统计一致性的结果; 利用长期人口糖尿病研究中持续葡萄糖监测的数据来说明这一方法的应用情况; 利用新的分布表和分组分析,可以探讨新的临床标准,说明5年来分配层的葡萄糖变化如何不同。