Machine learning has successfully framed many sequential decision making problems as either supervised prediction, or optimal decision-making policy identification via reinforcement learning. In data-constrained offline settings, both approaches may fail as they assume fully optimal behavior or rely on exploring alternatives that may not exist. We introduce an inherently different approach that identifies possible ``dead-ends'' of a state space. We focus on the condition of patients in the intensive care unit, where a ``medical dead-end'' indicates that a patient will expire, regardless of all potential future treatment sequences. We postulate ``treatment security'' as avoiding treatments with probability proportional to their chance of leading to dead-ends, present a formal proof, and frame discovery as an RL problem. We then train three independent deep neural models for automated state construction, dead-end discovery and confirmation. Our empirical results discover that dead-ends exist in real clinical data among septic patients, and further reveal gaps between secure treatments and those that were administered.
翻译:机器学习成功地将许多顺序决策问题定义为监督预测,或者通过强化学习确定最佳决策政策。在受数据限制的离线环境中,两种方法都可能失败,因为它们承担完全最佳的行为,或者依靠探索可能不存在的替代方法。我们引入了一种固有的不同方法,确定国家空间可能的“死端”。我们侧重于特护单位病人的状况,“医疗死端”表示患者将过期,而不管未来所有可能的治疗序列。我们假设“治疗安全”是避免治疗的可能性与其导致死端的概率成正比,提出正式证明,并将发现作为RL问题框架。我们随后培训了三种独立的深度神经模型,用于自动状态构造、死端发现和确认。我们的经验结果发现,在化粪病人的实际临床数据中存在死端,并进一步揭示了安全治疗与接受治疗者之间的差距。