Machine learning has successfully framed many sequential decision making problems as either supervised prediction, or optimal decision-making policy identification via reinforcement learning. In data-constrained offline settings, both approaches may fail as they assume fully optimal behavior or rely on exploring alternatives that may not exist. We introduce an inherently different approach that identifies possible "dead-ends" of a state space. We focus on the condition of patients in the intensive care unit, where a "medical dead-end" indicates that a patient will expire, regardless of all potential future treatment sequences. We postulate "treatment security" as avoiding treatments with probability proportional to their chance of leading to dead-ends, present a formal proof, and frame discovery as an RL problem. We then train three independent deep neural models for automated state construction, dead-end discovery and confirmation. Our empirical results discover that dead-ends exist in real clinical data among septic patients, and further reveal gaps between secure treatments and those that were administered.
翻译:机器学习成功地将许多顺序决策问题定义为监督预测,或者通过强化学习优化决策识别。在受数据限制的离线环境中,两种方法都可能失败,因为它们承担完全最佳的行为,或者依靠探索可能不存在的替代方法。我们引入了一种固有的不同方法,确定国家空间可能的“死端”。我们侧重于特护单位病人的状况,即“医疗死端”表明患者将过期,而不管未来所有可能的治疗序列如何。我们假设“治疗安全”为避免治疗概率与其导致死端的概率成正比的治疗,提出正式证明,并将发现框架设定为RL问题。我们随后为自动构造状态、死端发现和确认培训了三种独立的深神经模型。我们的经验结果发现,在化粪病人的实际临床数据中存在死端,并进一步揭示了安全治疗与管理的治疗之间的差距。