Treatment policies learned via reinforcement learning (RL) from observational health data are sensitive to subtle choices in study design. We highlight a simple approach, trajectory inspection, to bring clinicians into an iterative design process for model-based RL studies. We inspect trajectories where the model recommends unexpectedly aggressive treatments or believes its recommendations would lead to much more positive outcomes. Then, we examine clinical trajectories simulated with the learned model and policy alongside the actual hospital course to uncover possible modeling issues. To demonstrate that this approach yields insights, we apply it to recent work on RL for inpatient sepsis management. We find that a design choice around maximum trajectory length leads to a model bias towards discharge, that the RL policy preference for high vasopressor doses may be linked to small sample sizes, and that the model has a clinically implausible expectation of discharge without weaning off vasopressors.
翻译:从观测健康数据中通过强化学习(RL)学到的治疗政策对研究设计中的微妙选择十分敏感。我们强调一种简单的方法,即轨迹检查,将临床医生纳入基于模型的RL研究的迭代设计过程。我们检查轨迹,模型建议出乎意料的侵略性治疗,或认为其建议会导致更积极的结果。然后,我们用学习模式和政策模拟临床轨迹,与实际医院课程一起发现可能的模型问题。为了证明这种方法产生洞察力,我们将其应用到最近进行的关于住院败血症管理RL的工作。我们发现,围绕最大轨迹长度的设计选择会导致对排放的模型偏向,高血管压剂量的RL政策偏好可能与小样本大小相联系,而且该模型在临床上对排放的预期是难以令人信服的,而不会断断气压器。