Classical regression models do not cover non-Euclidean data that reside in a general metric space, while the current literature on non-Euclidean regression by and large has focused on scenarios where either predictors or responses are random objects, i.e., non-Euclidean, but not both. In this paper we propose geodesic optimal transport regression models for the case where both predictors and responses lie in a common geodesic metric space and predictors may include not only one but also several random objects. This provides an extension of classical multiple regression to the case where both predictors and responses reside in non-Euclidean metric spaces, a scenario that has not been considered before. It is based on the concept of optimal geodesic transports, which we define as an extension of the notion of optimal transports in distribution spaces to more general geodesic metric spaces, where we characterize optimal transports as transports along geodesics. The proposed regression models cover the relation between non-Euclidean responses and vectors of non-Euclidean predictors in many spaces of practical statistical interest. These include one-dimensional distributions viewed as elements of the 2-Wasserstein space and multidimensional distributions with the Fisher-Rao metric that are represented as data on the Hilbert sphere. Also included are data on finite-dimensional Riemannian manifolds, with an emphasis on spheres, covering directional and compositional data, as well as data that consist of symmetric positive definite matrices. We illustrate the utility of geodesic optimal transport regression with data on summer temperature distributions and human mortality.
翻译:暂无翻译