Increasingly complex data analysis tasks motivate the study of the dependency of distributions of multivariate continuous random variables on scalar or vector predictors. Statistical regression models for distributional responses so far have primarily been investigated for the case of one-dimensional response distributions. We investigate here the case of multivariate response distributions while adopting the 2-Wasserstein metric in the distribution space. The challenge is that unlike the situation in the univariate case, the optimal transports that correspond to geodesics in the space of distributions with the 2-Wasserstein metric do not have an explicit representation for multivariate distributions. We show that under some regularity assumptions the conditional Wasserstein barycenters constructed for a geodesic in the Euclidean predictor space form a corresponding geodesic in the Wasserstein distribution space and demonstrate how the notion of conditional barycenters can be harnessed to interpolate as well as extrapolate multivariate distributions. The utility of distributional inter- and extrapolation is explored in simulations and examples. We study both global parametric-like and local smoothing-like models to implement conditional Wasserstein barycenters and establish asymptotic convergence properties for the corresponding estimates. For algorithmic implementation we make use of a Sinkhorn entropy-penalized algorithm. Conditional Wasserstein barycenters and distribution extrapolation are illustrated with applications in climate science and studies of aging.
翻译:日益复杂的数据分析任务促使研究卡路里或矢量预测器上多变连续随机变量分布的依赖性。到目前为止,分布响应的统计回归模型主要对一维响应分布的情况进行了调查。我们在这里调查多变响应分布的案例,同时在分布空间采用2-Wasserstein标准。挑战在于,与单象体外的情况不同,与2Wasserstein标准在分布空间的大地测量相对应的最佳运输方式并不明确代表多变分布分布。我们表明,在某些常规假设下,为Euclidean预测器空间的大地测量设计而建造的有条件的瓦瑟斯坦酒吧酒吧中心,在瓦瑟斯坦分布空间中形成相应的大地测量分布分布。 挑战在于如何利用有条件的采信器概念进行内推,以及外推多变分布空间分布空间的地理特征,在模拟和实例中探索了分配中心间和外推推法的效用。我们研究的是,全球对巴瑟斯丁堡分布和本地平面的分布式科学应用,我们研究的是,为实施一个固定的卡斯特林斯坦内加纳模型,以建立一个固定的卡质内加。