The performance of decision policies and prediction models often deteriorates when applied to environments different from the ones seen during training. To ensure reliable operation, we propose and analyze the stability of a system under distribution shift, which is defined as the smallest change in the underlying environment that causes the system's performance to deteriorate beyond a permissible threshold. In contrast to standard tail risk measures and distributionally robust losses that require the specification of a plausible magnitude of distribution shift, the stability measure is defined in terms of a more intuitive quantity: the level of acceptable performance degradation. We develop a minimax optimal estimator of stability and analyze its convergence rate, which exhibits a fundamental phase shift behavior. Our characterization of the minimax convergence rate shows that evaluating stability against large performance degradation incurs a statistical cost. Empirically, we demonstrate the practical utility of our stability framework by using it to compare system designs on problems where robustness to distribution shift is critical.
翻译:决策政策和预测模型的绩效在应用到不同于培训期间所看到的环境时往往会恶化。为了确保可靠的运行,我们提议和分析分布变化中系统的稳定性,该系统被定义为导致系统业绩恶化超过允许阈值的基本环境最小变化。标准尾端风险计量和分布稳健的损失要求说明分配变化的合理规模,与此相反,稳定度的界定是更直观的数量:可接受的性能退化的程度。我们开发了一个稳定度最低最佳估测器,并分析其趋同率,显示其基本阶段转移行为。我们对微积分趋同率的定性表明,评估稳定性以防范大规模绩效退化会产生统计成本。我们很生动地展示了我们稳定框架的实际效用,方法是利用它来比较系统设计对分布转移的稳健性至关重要的问题。