Incomplete covariate vectors are known to be problematic for estimation and inferences on model parameters, but their impact on prediction performance is less understood. We develop an imputation-free method that builds on a random partition model admitting variable-dimension covariates. Cluster-specific response models further incorporate covariates via linear predictors, facilitating estimation of smooth prediction surfaces with relatively few clusters. We exploit marginalization techniques of Gaussian kernels to analytically project response distributions according to any pattern of missing covariates, yielding a local regression with internally consistent uncertainty propagation that utilizes only one set of coefficients per cluster. Aggressive shrinkage of these coefficients regulates uncertainty due to missing covariates. The method allows in- and out-of-sample prediction for any missingness pattern, even if the pattern in a new subject's incomplete covariate vector was not seen in the training data. We develop an MCMC algorithm for posterior sampling that improves a computationally expensive update for latent cluster allocation. Finally, we demonstrate the model's effectiveness for nonlinear point and density prediction under various circumstances by comparing with other recent methods for regression of variable dimensions on synthetic and real data.
翻译:在模型参数的估计和推论中,已知完整的共变矢量对模型参数有问题,但模型参数对预测性效果的影响不那么为人所知。我们以随机分配模型为基础开发一种无估算法方法,该方法以吸收可变二门化共变变量为主。集成响应模型进一步通过线性预测器纳入共变变量,便利以相对较少的组群对平滑的预测表面进行估计。我们利用高斯内核内核的边缘化技术,根据任何缺失的共变变量模式进行分析性项目响应分布,产生局部回归,同时以内部一致的不确定性传播,只使用每组一组的一组系数中的一组。这些系数的递缩缩缩控制了因缺失的不确定性。该方法允许对任何缺失模式进行内部和外部预测,即使在培训数据中没有看到新主体的不完整共变式矢量的模型。我们开发了用于后方取样的MC算法,从而改进了计算成本高昂的潜在集群分配。最后,我们展示了模型在非线性点和合成密度方面的有效性,通过比较其他方法,对近期数据进行可变式回归分析。