基于强化学习的流行病控制方法研究及其在COVID-19中的应用 (On a Reinforcement Learning Methodology for Epidemic Control, with application to COVID-19)

This paper presents a real time, data driven decision support framework for epidemic control. We combine a compartmental epidemic model with sequential Bayesian inference and reinforcement learning (RL) controllers that adaptively choose intervention levels to balance disease burden, such as intensive care unit (ICU) load, against socio economic costs. We construct a context specific cost function using empirical experiments and expert feedback. We study two RL policies: an ICU threshold rule computed via Monte Carlo grid search, and a policy based on a posterior averaged Q learning agent. We validate the framework by fitting the epidemic model to publicly available ICU occupancy data from the COVID 19 pandemic in England and then generating counterfactual roll out scenarios under each RL controller, which allows us to compare the RL policies to the historical government strategy. Over a 300 day period and for a range of cost parameters, both controllers substantially reduce ICU burden relative to the observed interventions, illustrating how Bayesian sequential learning combined with RL can support the design of epidemic control policies.

翻译：本文提出了一种实时数据驱动的流行病控制决策支持框架。我们将分室流行病模型与序贯贝叶斯推断及强化学习控制器相结合，这些控制器能自适应地选择干预水平，以平衡疾病负担（如重症监护病房负荷）与社会经济成本。通过实证实验与专家反馈，我们构建了特定情境下的成本函数。我们研究了两种强化学习策略：一种是通过蒙特卡洛网格搜索计算的ICU阈值规则，另一种是基于后验平均Q学习智能体的策略。我们通过将流行病模型拟合至英格兰COVID-19大流行期间公开的ICU占用数据来验证该框架，随后在每种强化学习控制器下生成反事实推演场景，从而将强化学习策略与历史政府策略进行比较。在300天的时间范围内及一系列成本参数下，两种控制器相较于实际观察到的干预措施均显著降低了ICU负担，这证明了贝叶斯序贯学习与强化学习相结合如何支持流行病控制策略的设计。