In this work, we study data-driven decision-making and depart from the classical identically and independently distributed (i.i.d.) assumption. We present a new framework in which historical samples are generated from unknown and different distributions, which we dub heterogeneous environments. These distributions are assumed to lie in a heterogeneity ball with known radius and centered around the (also) unknown future (out-of-sample) distribution on which the performance of a decision will be evaluated. We quantify the asymptotic worst-case regret that is achievable by central data-driven policies such as Sample Average Approximation, but also by rate-optimal ones, as a function of the radius of the heterogeneity ball. Our work shows that the type of achievable performance varies considerably across different combinations of problem classes and notions of heterogeneity. We demonstrate the versatility of our framework by comparing achievable guarantees for the heterogeneous version of widely studied data-driven problems such as pricing, ski-rental, and newsvendor. En route, we establish a new connection between data-driven decision-making and distributionally robust optimization.
翻译:在这项工作中,我们研究由数据驱动的决策,并脱离传统的完全和独立分布的(i.d.)假设;我们提出了一个新的框架,根据这个框架,历史样本来自未知和不同的分布,我们把这种分布置于不同的环境之下;这些分布假定存在于一个异质球中,以已知的半径为中心,并围绕(也)未知的未来(非抽样)分布,据以评价某项决定的执行情况;我们量化了由中央数据驱动的政策(例如抽样平均接近率,但也通过速率优化政策)所能实现的无症状最坏的遗憾,作为异质球半径的函数;我们的工作表明,在数据驱动的决策和分布稳健的优化之间,可实现的绩效类型在不同的问题类别和异质概念组合中有很大差异。我们通过比较对广泛研究的数据驱动问题的多元版本,例如定价、滑雪和新闻编辑等的可实现的保证,来证明我们框架的多用途。在路径上,我们在数据驱动的决策和分布稳健的优化之间建立了新的联系。