VacSIM:学习利用强化学习分发COVID-19疫苗的有效战略 (VacSIM: Learning Effective Strategies for COVID-19 Vaccine Distribution using Reinforcement Learning)

Raghav Awasthi,Keerat Kaur Guliani,Saif Ahmad Khan,Aniket Vashishtha,Mehrab Singh Gill,Arshita Bhatt,Aditya Nagori,Aniket Gupta,Ponnurangam Kumaraguru,Tavpritesh Sethi

from arxiv, 14 pages, 5 figures

A COVID-19 vaccine is our best bet for mitigating the ongoing onslaught of the pandemic. However, vaccine is also expected to be a limited resource. An optimal allocation strategy, especially in countries with access inequities and temporal separation of hot-spots, might be an effective way of halting the disease spread. We approach this problem by proposing a novel pipeline VacSIM that dovetails Deep Reinforcement Learning models into a Contextual Bandits approach for optimizing the distribution of COVID-19 vaccine. Whereas the Reinforcement Learning models suggest better actions and rewards, Contextual Bandits allow online modifications that may need to be implemented on a day-to-day basis in the real world scenario. We evaluate this framework against a naive allocation approach of distributing vaccine proportional to the incidence of COVID-19 cases in five different States across India (Assam, Delhi, Jharkhand, Maharashtra and Nagaland) and demonstrate up to 9039 potential infections prevented and a significant increase in the efficacy of limiting the spread over a period of 45 days through the VacSIM approach. Our models and the platform are extensible to all states of India and potentially across the globe. We also propose novel evaluation strategies including standard compartmental model-based projections and a causality-preserving evaluation of our model. Since all models carry assumptions that may need to be tested in various contexts, we open source our model VacSIM and contribute a new reinforcement learning environment compatible with OpenAI gym to make it extensible for real-world applications across the globe. (http://vacsim.tavlab.iiitd.edu.in:8000/).

翻译：COVID-19疫苗是我们缓解这一大流行病持续蔓延的最佳办法,然而,疫苗预计也是一种有限的资源。一个最佳分配战略,特别是在有获取不平等和热点暂时分离的国家,可能是阻止疾病传播的有效办法。我们通过提出一个新的VacSIM管道来解决这一问题,该管道将深度强化学习模型与深层强化学习模型相匹配,形成一种环境强盗方式,以优化分发COVID-19疫苗。虽然强化学习模式表明更好的行动和回报,但背景强盗允许在线修改,可能需要在现实世界情景中每天实施。我们评估这一框架,防止在印度五个不同国家(阿萨姆、德里、贾坎德、马哈拉施特拉和纳加兰)传播与COVID-19病例发生率成比例的天真分配方法。我们还提出了多达9039种潜在感染的预防方法,并大大提高了通过VacSIM方法在45天的公开传播的功效:我们的模型和平台可以保存到印度所有各州,包括全球的更高程度的预测。我们提出了一种标准环境评估,从全球范围来看,我们需要一种创新的模型。

相关内容