We propose SLTD (`Sequential Learning-to-Defer') a framework for learning-to-defer pre-emptively to an expert in sequential decision-making settings. SLTD measures the likelihood of improving value of deferring now versus later based on the underlying uncertainty in dynamics. In particular, we focus on the non-stationarity in the dynamics to accurately learn the deferral policy. We demonstrate our pre-emptive deferral can identify regions where the current policy has a low probability of improving outcomes. SLTD outperforms existing non-sequential learning-to-defer baselines, whilst reducing overall uncertainty on multiple synthetic and real-world simulators with non-stationary dynamics. We further derive and decompose the propagated (long-term) uncertainty for interpretation by the domain expert to provide an indication of when the model's performance is reliable.
翻译:我们建议SLTD(“按顺序学习到分解”)为在顺序决策环境中先发制人的专家提供学习到先发制人的框架。SLTD根据动态中潜在的不确定性,衡量现在和以后的推迟价值提高的可能性。我们尤其注重动态中的不常态性,以便准确了解推迟政策。我们证明,我们先发制人推迟政策可以确定目前政策改善结果的可能性较低的区域。SLTD超越了现有的非按顺序学习到分解的基线,同时减少了具有非静止动态的多个合成和现实世界模拟器的总体不确定性。我们进一步得出并消除传播的(长期)不确定性,供域专家解释,以说明模型的性能何时可靠。