In this manuscript, we study the problem of scalar-on-distribution regression; that is, instances where subject-specific distributions or densities, or in practice, repeated measures from those distributions, are the covariates related to a scalar outcome via a regression model. We propose a direct regression for such distribution-valued covariates that circumvents estimating subject-specific densities and directly uses the observed repeated measures as covariates. The model is invariant to any transformation or ordering of the repeated measures. Endowing the regression function with a Gaussian Process prior, we obtain closed form or conjugate Bayesian inference. Our method subsumes the standard Bayesian non-parametric regression using Gaussian Processes as a special case. Theoretically, we show that the method can achieve an optimal estimation error bound. To our knowledge, this is the first theoretical study on Bayesian regression using distribution-valued covariates. Through simulation studies and analysis of activity count dataset, we demonstrate that our method performs better than approaches that require an intermediate density estimation step.
翻译:在此手稿中,我们研究分布式反转的问题;即特定对象的分布或密度,或实际中这些分布的反复措施,都是通过回归模型与一个斜度结果相关的共变体。我们建议对此类分布值的共变体进行直接回归,以绕过对特定对象密度的估计,并直接将观察到的重复措施用作共变体。模型对重复测量的任何转换或命令是无差异的。通过模拟研究和分析活动数据集,我们通过模拟研究和分析活动数据集,证明我们的方法比中间密度估计步骤要好。从理论上讲,我们证明这种方法可以达到最佳的估计误差。据我们所知,这是关于使用分布式偏差的重复测量的第一次理论研究。通过模拟研究和分析活动数据集,我们证明我们的方法比需要中间密度估计步骤的方法要好。</s>