Dynamic diagnosis is desirable when medical tests are costly or time-consuming. In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost. Clinical diagnostic data are often highly imbalanced; therefore, we aim to maximize the $F_1$ score instead of the error rate. However, optimizing the non-concave $F_1$ score is not a classic RL problem, thus invalidates standard RL methods. To remedy this issue, we develop a reward shaping approach, leveraging properties of the $F_1$ score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained $F_1$ score maximization. To handle the combinatorially complex state space, we propose a Semi-Model-based Deep Diagnosis Policy Optimization (SM-DDPO) framework that is compatible with end-to-end training and online learning. SM-DDPO is tested on diverse clinical tasks: ferritin abnormality detection, sepsis mortality prediction, and acute kidney injury diagnosis. Experiments with real-world data validate that SM-DDPO trains efficiently and identifies all Pareto-front solutions. Across all tasks, SM-DDPO is able to achieve state-of-the-art diagnosis accuracy (in some cases higher than conventional methods) with up to $85\%$ reduction in testing cost. The code is available at [https://github.com/Zheng321/Blood_Panel].
翻译:当医疗测试费用昂贵或耗时时,动态诊断是可取的。在这项工作中,我们使用强化学习(RL)来寻找一种动态政策,根据先前的观察,按顺序选择实验室测试板,确保以低费用进行准确的测试。临床诊断数据往往高度失衡;因此,我们的目标是最大限度地提高1美元分,而不是差错率。然而,优化基于非计算值的1美元分不是一个典型的RL问题,从而使标准RL方法失效。为了纠正这一问题,我们开发了一种奖赏塑造方法,利用1美元分和政策优化的双倍性能,以便找到一套所有预算限制的Pareto最佳政策。为了处理组合复杂的状态空间,我们建议采用半模式的深度诊断政策(SM-DPO)框架,这个框架与期末培训和在线学习兼容。SDPO方案在多种临床任务上进行了测试:递增1美元分的准确度为1美元,为预算限制值的1美元;SDPA-O-S-dealimal-deal-deal-deal-Creal-Cral-Cal-CLisal-CRisal-IDDDDDDS-S-CS-ILislevalassilling dislation Axxxxxxxxxxxx,所有S-S-S-S-S-S-S-S-S-S-SDDDDDDDDDDDDDDDDRVDxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,它能和S-SDDDDDDDDDD-S-S-S-S-S-S-S-S-S-S-S-S-S-SDDDDDDDDDDDDDDDDDDDDGVDDDDDDDDDDDDDDAR)所有。 和S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-