We develop a Distributionally Robust Optimization (DRO) formulation for Multiclass Logistic Regression (MLR), which could tolerate data contaminated by outliers. The DRO framework uses a probabilistic ambiguity set defined as a ball of distributions that are close to the empirical distribution of the training set in the sense of the Wasserstein metric. We relax the DRO formulation into a regularized learning problem whose regularizer is a norm of the coefficient matrix. We establish out-of-sample performance guarantees for the solutions to our model, offering insights on the role of the regularizer in controlling the prediction error. We apply the proposed method in rendering deep CNN-based image classifiers robust to random and adversarial attacks. Specifically, using the MNIST and CIFAR-10 datasets, we demonstrate reductions in test error rate by up to 78.8% and loss by up to 90.8%. We also show that with a limited number of perturbed images in the training set, our method can improve the error rate by up to 49.49% and the loss by up to 68.93% compared to Empirical Risk Minimization (ERM), converging faster to an ideal loss/error rate as the number of perturbed images increases.
翻译:我们为多级物流回归(MLR)开发了一种分布式强力优化(DRO)配方,可以容忍数据被外部线污染的数据。DRO框架使用一种概率模糊性方法,其定义是接近瓦塞斯坦标准(Wasserstein 指标)培训经验分布的分布球。我们将DRO配方放松为常规化学习问题,其常规化是系数矩阵的规范。我们为模型的解决方案建立了超模版性能保障,就常规化器在控制预测错误方面的作用提供了洞察力。我们运用了拟议的方法,使基于CNN的深度图像分类器对随机和对抗性攻击具有很强的强度。具体地说,使用MMIST和CIFAR-10数据集,我们显示了测试误差率下降幅度高达78.8%,损失高达90.8%。我们还表明,由于培训设置的透视图像数量有限,我们的方法可以提高误差率,达到49.49%,损失率达到68.93%,而理想性风险最小化为理想性降低率。