This paper tackles the problem of constructing a non-parametric predictor when the latent variables are given with incomplete information. The convenient predictor for this task is the random forest algorithm in conjunction to the so-called CART criterion. The proposed technique enables a partial imputation of the missing values in the data set in a way that suits both a consistent estimator of the regression function as well as a partial recovery of the missing values. A proof of the consistency of the random forest estimator is given in the case where each latent variable is missing completely at random (MCAR).
翻译:本文处理在潜在变量得到不完整信息时构建非参数预测器的问题。 此项任务的方便预测器是随机森林算法,与所谓的 CART 标准相结合。 拟议的技术可以对数据集中缺失的值进行部分估算, 既适合对回归函数的一致估计,也适合对缺失值的部分恢复。 当每个潜在变量完全随机缺失时, 提供了随机森林估计器一致性的证据( MCAR ) 。