Bayesian methods, distributionally robust optimization methods, and regularization methods are three pillars of trustworthy machine learning hedging against distributional uncertainty, e.g., the uncertainty of an empirical distribution compared to the true underlying distribution. This paper investigates the connections among the three frameworks and, in particular, explores why these frameworks tend to have smaller generalization errors. Specifically, first, we suggest a quantitative definition for "distributional robustness", propose the concept of "robustness measure", and formalize several philosophical concepts in distributionally robust optimization. Second, we show that Bayesian methods are distributionally robust in the probably approximately correct (PAC) sense; In addition, by constructing a Dirichlet-process-like prior in Bayesian nonparametrics, it can be proven that any regularized empirical risk minimization method is equivalent to a Bayesian method. Third, we show that generalization errors of machine learning models can be characterized using the distributional uncertainty of the nominal distribution and the robustness measures of these machine learning models, which is a new perspective to bound generalization errors, and therefore, explain the reason why distributionally robust machine learning models, Bayesian models, and regularization models tend to have smaller generalization errors.
翻译:Bayesian 方法、 分布稳健的优化方法和正规化方法是值得信赖的机器学习避免分配不确定性的三大支柱,例如, 实证分布与真实基本分布相比的不确定性; 本文调查三个框架之间的联系,特别是探讨这些框架为什么倾向于出现较小的概括化错误。 具体地说, 首先, 我们提出“ 分配稳健性” 的量化定义, 提出“ 分配稳健性衡量” 概念, 并在分配稳健优化中正式确定若干哲学概念。 第二, 我们表明, 巴耶斯方法在分布上很稳健, 可能大致正确( PAC) ; 此外, 通过在巴耶斯非参数中建立类似于Drichlet- 过程, 可以证明任何正规化实证风险最小化方法都等同于巴耶斯方法。 第三, 我们表明, 机器学习模型的概括性错误可以用名义分布不确定性和这些机器学习模型的稳健度衡量方法来定性。 这是限制一般错误的新视角, 从而解释为何分配稳健的机器学习模型、 巴耶斯模型和正规化模式具有较小的一般模式。