Trustworthy machine learning aims at combating distributional uncertainties in training data distributions compared to population distributions. Typical treatment frameworks include the Bayesian approach, (min-max) distributionally robust optimization (DRO), and regularization. However, two issues have to be raised: 1) All these methods are biased estimators of the true optimal cost; 2) the prior distribution in the Bayesian method, the radius of the distributional ball in the DRO method, and the regularizer in the regularization method are difficult to specify. This paper studies a new framework that unifies the three approaches and that addresses the two challenges mentioned above. The asymptotic properties (e.g., consistency and asymptotic normalities), non-asymptotic properties (e.g., unbiasedness and generalization error bound), and a Monte--Carlo-based solution method of the proposed model are studied. The new model reveals the trade-off between the robustness to the unseen data and the specificity to the training data.
翻译:典型的治疗框架包括巴伊西亚方法,(最低)分布强力优化(DRO)和正规化,但必须提出两个问题:(1) 所有这些方法都是对真正最佳成本的偏差估计;(2) 巴伊西亚方法的先前分布,DRO方法中分布球的半径,以及正规化方法的正规化方法难以具体说明。本文研究一个新的框架,将三种方法统一起来,并应对上述两个挑战。无现性特性(如一致性和无现性常态)、非现性特性(如不偏向性和一般化误差约束)以及拟议模型的蒙特-卡洛解决方案方法。新模型揭示了对无形数据的稳性与培训数据的特殊性之间的利弊。