在安全紧急离线强化学习中查明死后身份 (Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning)

In safety-critical decision-making scenarios being able to identify worst-case outcomes, or dead-ends is crucial in order to develop safe and reliable policies in practice. These situations are typically rife with uncertainty due to unknown or stochastic characteristics of the environment as well as limited offline training data. As a result, the value of a decision at any time point should be based on the distribution of its anticipated effects. We propose a framework to identify worst-case decision points, by explicitly estimating distributions of the expected return of a decision. These estimates enable earlier indication of dead-ends in a manner that is tunable based on the risk tolerance of the designed task. We demonstrate the utility of Distributional Dead-end Discovery (DistDeD) in a toy domain as well as when assessing the risk of severely ill patients in the intensive care unit reaching a point where death is unavoidable. We find that DistDeD significantly improves over prior discovery approaches, providing indications of the risk 10 hours earlier on average as well as increasing detection by 20%.

翻译：在能够确定最坏情况结果的安全临界决策情景中,或者为了制定安全和可靠的实际政策,确定最坏情况结果或死端是关键。这些情况通常由于环境的未知或随机性特点以及有限的离线培训数据而充满不确定性。因此,任何时间点的决定的价值都应基于其预期效果的分布。我们提议了一个框架,通过明确估计预期返回决定的分布,确定最坏情况决定点。这些估计使得能够以设计任务的风险容忍度为基础,以可捕捉的方式,及早指明死端。我们展示了在玩具领域分配死端发现(DidetDescovery)的效用,以及在评估到达无法避免死亡点的重症患者在密集护理单位的风险时的效用。我们发现,DidtDeD大大改进了先前发现方法,提供了平均10小时前的风险指标,并将检测率提高20%。