When the underlying probability distribution in a stochastic optimization is observed only through data, various data-driven formulations have been studied to obtain approximate optimal solutions. We show that no such formulations can, in a sense, theoretically improve the statistical quality of the solution obtained from empirical optimization. We argue this by proving that the first-order behavior of the optimality gap against the oracle best solution, which includes both the bias and variance, for any data-driven solution is second-order stochastically dominated by empirical optimization, as long as suitable smoothness holds with respect to the underlying distribution. We demonstrate this impossibility of improvement in a range of examples including regularized optimization, distributionally robust optimization, parametric optimization and Bayesian generalizations. We also discuss the connections of our results to semiparametric statistical inference and other perspectives in the data-driven optimization literature.
翻译:当仅通过数据观察在随机优化中的基本概率分布时,已经对各种数据驱动的配方进行了研究,以获得近似的最佳解决办法。我们表明,在某种意义上,任何此类配方都无法从理论上改进从经验优化中获得的解决办法的统计质量。我们通过证明最佳性差距与包括偏差和差异在内的神器最佳解决办法的第一阶行为,对任何数据驱动的解决方案来说,只要在基本分配方面保持适当的顺畅,那么任何数据驱动的解决方案都是由经验优化控制的第二阶层结构。我们证明,在一系列例子中,包括正规化优化、分布稳健优化、准度优化和巴耶斯通用,不可能有改进之处。我们还讨论了我们的结果与数据驱动优化文献中的半对称统计推断和其他观点之间的联系。