This monograph develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data using Distributionally Robust Optimization (DRO) under the Wasserstein metric. Beginning with fundamental properties of the Wasserstein metric and the DRO formulation, we explore duality to arrive at tractable formulations and develop finite-sample, as well as asymptotic, performance guarantees. We consider a series of learning problems, including (i) distributionally robust linear regression; (ii) distributionally robust regression with group structure in the predictors; (iii) distributionally robust multi-output regression and multiclass classification, (iv) optimal decision making that combines distributionally robust regression with nearest-neighbor estimation; (v) distributionally robust semi-supervised learning, and (vi) distributionally robust reinforcement learning. A tractable DRO relaxation for each problem is being derived, establishing a connection between robustness and regularization, and obtaining bounds on the prediction and estimation errors of the solution. Beyond theory, we include numerical experiments and case studies using synthetic and real data. The real data experiments are all associated with various health informatics problems, an application area which provided the initial impetus for this work.
翻译:该专著开发了一个综合的统计学习框架,这个框架对使用瓦瑟斯坦标准(Wasserstein指标)的分布式强力优化优化(DRO)在数据中(分布性振动)扰动(分布性振动)非常有力。从瓦瑟斯坦指标和DRO配方的基本特性开始,我们探讨双重性,以达成可移植的配方和开发有限模版,以及微调性性能保证。我们考虑了一系列学习问题,包括(一) 分布性强强的线性回归;(二) 预测器中组结构的分布性强强的回归;(三) 分布性强的多产出回归和多级分类;(四) 最佳决策,将分布性强的回归与近邻国估算相结合;(五) 分布性强的半监督性半监督性学习,以及(六) 分布性强强的强化学习。我们正在对每个问题产生可拉动性DRO的放松性,在稳健性和正规性之间建立联系,并获得解决方案预测和估计误差的界限。除了理论外,我们还包括利用合成和真实数据进行数字实验和案例研究。实际数据,实际的实验是所有与各种健康领域相关的应用。