Although reinforcement learning (RL) has tremendous success in many fields, applying RL to real-world settings such as healthcare is challenging when the reward is hard to specify and no exploration is allowed. In this work, we focus on recovering clinicians' rewards in treating patients. We incorporate the what-if reasoning to explain clinician's actions based on future outcomes. We use generalized additive models (GAMs) - a class of accurate, interpretable models - to recover the reward. In both simulation and a real-world hospital dataset, we show our model outperforms baselines. Finally, our model's explanations match several clinical guidelines when treating patients while we found the previously-used linear model often contradicts them.
翻译:尽管强化学习(RL)在许多领域都取得了巨大成功,但将RL应用到像医疗保健这样的现实世界环境中,当奖赏难以确定且不允许探索时,就具有挑战性。在这项工作中,我们侧重于恢复临床医生在治疗病人方面的奖赏。我们根据未来结果纳入了解释临床医生行动的理由。我们使用通用添加模型(GAMs) — — 一组准确、可解释的模型 — — 来恢复奖赏。在模拟和真实世界医院数据集中,我们展示了我们的模型表现超过基准。最后,我们模型的解释在治疗病人时与几个临床指南相匹配,而我们发现以前使用的线性模型常常与它们相矛盾。