We introduce a novel framework for nonlinear sufficient dimension reduction where both the predictor and the response are distributional data, which are modeled as members of a metric space. Our key step to achieving the nonlinear sufficient dimension reduction is to build universal kernels on the metric spaces, which results in reproducing kernel Hilbert spaces for the predictor and response that are rich enough to characterize the conditional independence that determines sufficient dimension reduction. For univariate distributions, we use the well-known quantile representation of the Wasserstein distance to construct the universal kernel; for multivariate distributions, we resort to the recently developed sliced Wasserstein distance to achieve this purpose. Since the sliced Wasserstein distance can be computed by aggregation of quantile representation of the univariate Wasserstein distance, the computation of multivariate Wasserstein distance is kept at a manageable level. The method is applied to several data sets, including fertility and mortality distribution data and Calgary temperature data.
翻译:我们引入了一个非线性足够维度减少的新框架, 预测器和反应都是分布性数据, 它们是以一个计量空间的成员为模型的。 我们实现非线性足够维度减少的关键步骤是建立公制空间的通用内核, 从而产生用于预测和反应的内核Hilbert空间, 而这些空间的内核足以说明有条件的独立性, 足以确定足够的维度减少。 对于单象牙分布, 我们使用众所周知的瓦塞斯坦距离的四分位表示来构建通用内核; 对于多变量分布, 我们使用最近开发的切片瓦塞斯坦距离来达到这个目的。 由于切片瓦塞斯坦距离可以通过单象牙瓦塞斯坦距离的四分位表示组合来计算, 多变瓦塞斯坦距离的计算将保持在可控制的水平上。 该方法适用于数套数据, 包括生育力和死亡率分配数据以及卡尔加里温度数据。