Dynamic diagnosis is desirable when medical tests are costly or time-consuming. In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost. Clinical diagnostic data are often highly imbalanced; therefore, we aim to maximize the $F_1$ score instead of the error rate. However, optimizing the non-concave $F_1$ score is not a classic RL problem, thus invalidates standard RL methods. To remedy this issue, we develop a reward shaping approach, leveraging properties of the $F_1$ score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained $F_1$ score maximization. To handle the combinatorially complex state space, we propose a Semi-Model-based Deep Diagnosis Policy Optimization (SM-DDPO) framework that is compatible with end-to-end training and online learning. SM-DDPO is tested on diverse clinical tasks: ferritin abnormality detection, sepsis mortality prediction, and acute kidney injury diagnosis. Experiments with real-world data validate that SM-DDPO trains efficiently and identifies all Pareto-front solutions. Across all tasks, SM-DDPO is able to achieve state-of-the-art diagnosis accuracy (in some cases higher than conventional methods) with up to $85\%$ reduction in testing cost. The code is available at [https://github.com/Zheng321/Deep-Reinforcement-Learning-for-Cost-Effective-Medical-Diagnosis].
翻译:当医疗测试费用昂贵或耗时时,动态诊断是可取的。在这项工作中,我们使用强化学习(RL)来寻找一种动态政策,根据先前的观察,按顺序选择实验室测试板,确保以低费用进行准确的测试。临床诊断数据往往高度失衡;因此,我们的目标是最大限度地提高1美元分,而不是差错率。然而,优化非康氏1美元分不是一个典型的RL问题,从而使标准的RL方法失效。为了纠正这一问题,我们开发了一种奖励塑造方法,利用1美元比值1美元和双倍的政策优化,以便根据以往的观察来选择实验室测试板测试板块测试板块测试板块测试板块测试板块测试板块最佳政策。SMDPO测试了一些更高临床任务:直径直-直线诊断系统测试、Septial-Creal-Charimation-Charimation-Climate-DDDRDRDA、S-C-CRDDRDA、S-CSDDD所有S-C-C-CRDDDDDD-SDDDDDDDDDR 测试、S-S-S-S-S-S-S-S-S-SDDDDD-SDDDDDDDDDD-SDDD-SDDD-S-S-S-S-SD-SD-SDDDDD-SD-SD-SDDDD-SD-SD-SD-SDDDD-SD-SDDD-SD-SD-SD-SD-SD-SD-SD-SD-SD-S-S-S-S-S-S-S-SD-S-S-SD-SD-SD-S-SDD-SDSDDDDDDDDDDDDDDDSDDDDDDDDDDDDDDDDD-S-S-S-S-S-S-S-S-S-S-S-S-SDDDD-SD-S-SD-SD-SD-SD-所有</s>