We consider stochastic optimization problems which use observed data to estimate essential characteristics of the random quantities involved. Sample average approximation (SAA) or empirical (plug-in) estimation are very popular ways to use data in optimization. It is well known that sample average optimization suffers from downward bias. We propose to use smooth estimators rather than empirical ones in optimization problems. We establish consistency results for the optimal value and the set of optimal solutions of the new problem formulation. The performance of the proposed approach is compared to SAA theoretically and numerically. We analyze the bias of the new problems and identify sufficient conditions for ensuring less biased estimation of the optimal value of the true problem. At the same time, the error of the new estimator remains controlled. We show that those conditions are satisfied for many popular statistical problems such as regression models, classification problems, and optimization problems with Average (Conditional) Value-at-Risk. We have observed that smoothing the least-squares objective in a regression problem by a normal kernel leads to a ridge regression. Our numerical experience shows that the new estimators frequently exhibit also smaller variance and smaller mean-square error than those of SAA.
翻译:我们考虑了使用观测数据估计随机数量基本特征的随机优化问题。样本平均近似(SAA)或实验性(插件)估算是利用数据优化的非常流行的方法。众所周知,样本平均优化存在下降偏差。我们提议在优化问题中采用平滑的估算器,而不是经验性优化。我们为新问题表述的最佳价值和一套最佳解决办法确定一致的结果。拟议方法的性能与SAA理论和数值比较。我们分析了新问题的偏向性,并确定了确保对真正问题的最佳价值进行不那么偏差的估计的充足条件。与此同时,新的估算器的错误仍然得到控制。我们表明,对于许多流行的统计问题,如回归模型、分类问题和平均(条件性)价值-风险的优化问题,这些条件都得到满足。我们观察到,在正常内核回归问题中,最差的客观目标通过正常内核导致脊回归。我们的数字经验表明,新的估量器经常出现较小差异,而中的平均比例差也较小。