Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process), a subclass of POMDPs in which the Markov assumption does not hold specifically due to prolonged effects of actions. Motivated by the pharmacology literature, we propose a simple and effective approach to converting drug dosing PAE-POMDPs into MDPs, enabling the use of the existing RL algorithms to solve such problems. We validate the proposed approach on a toy task, and a challenging glucose control task, for which we devise a clinically-inspired reward function. Our results demonstrate that: (1) the proposed method to restore the Markov assumption leads to significant improvements over a vanilla baseline; (2) the approach is competitive with recurrent policies which may inherently capture the prolonged effect of actions; (3) it is remarkably more time and memory efficient than the recurrent baseline and hence more suitable for real-time dosing control systems; and (4) it exhibits favorable qualitative behavior in our policy analysis.
翻译:使用抗逆转录病毒药物是AI的一个重要应用,可以作为强化学习(RL)问题来制定。在本文件中,我们确定了使用RL进行药物注射的两大挑战:药物管理的延迟和长期影响,这打破了Markov对RL框架的假设。我们侧重于延长和定义PAE-POMDP(长期行动效果-部分可观测的Markov决定程序),这是POMDP的一个亚类,我们为此设计了一个临床激励奖励功能。我们的成果表明:(1) 恢复Markov假设的拟议方法导致对香草基线的重大改进;(2) 我们提出将PAE-POMDP药物转化为MDP药物的简单而有效的方法,使现有RL算法能够用于解决这些问题。我们确认拟议的方法的长效任务,以及具有挑战性的胶囊控制任务。我们提出的方法表明:(1) 恢复Markov假设的拟议方法导致对香草基线的显著改进;(2) 经常政策具有竞争性,其经常发生的作用是能够更持久地反映我们经常性的基线;(3) 更具有竞争力的政策分析,从而能够更准确地反映我们长期的经常性的基准;(3) 能够更准确地反映我们的经常性的、更精确地反映我们的实际做法。