Active screening is a common approach in controlling the spread of recurring infectious diseases such as tuberculosis and influenza. In this approach, health workers periodically select a subset of population for screening. However, given the limited number of health workers, only a small subset of the population can be visited in any given time period. Given the recurrent nature of the disease and rapid spreading, the goal is to minimize the number of infections over a long time horizon. Active screening can be formalized as a sequential combinatorial optimization over the network of people and their connections. The main computational challenges in this formalization arise from i) the combinatorial nature of the problem, ii) the need of sequential planning and iii) the uncertainties in the infectiousness states of the population. Previous works on active screening fail to scale to large time horizon while fully considering the future effect of current interventions. In this paper, we propose a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN), with several innovative adaptations that are designed to address the above challenges. First, we use graph convolutional networks (GCNs) to represent the Q-function that exploit the node correlations of the underlying contact network. Second, to avoid solving a combinatorial optimization problem in each time period, we decompose the node set selection as a sub-sequence of decisions, and further design a two-level RL framework that solves the problem in a hierarchical way. Finally, to speed-up the slow convergence of RL which arises from reward sparseness, we incorporate ideas from curriculum learning into our hierarchical RL approach. We evaluate our RL algorithm on several real-world networks.
翻译:积极筛查是控制肺结核和流感等经常性传染病传播的共同方法。在这一方法中,卫生工作者定期选择一组人口进行筛查。然而,鉴于保健工作者人数有限,在任何特定时期内只能访问一小部分人口。鉴于该疾病经常发生,而且迅速蔓延,目标是在较长的时间内尽量减少感染病例数量。积极筛查可以正式成为对人们网络及其联系的顺序组合优化。这种正规化的主要计算挑战来自问题组合性质,二)需要顺序规划和人口传染状态的不确定性。以往积极筛查工作在充分考虑当前干预措施的未来效果的同时,未能达到大范围的时间范围。在本文件中,我们提议采用基于深Q网络(DQN)的新型强化学习(RL)方法,以及一些旨在应对上述挑战的创新适应性调整。首先,我们从问题分类变异网络(GCNs)到连续规划和递增速度网络的不确定性。我们用“RL”方法来避免将“RL”系统内部的升级和“RL”方法的升级,我们用“R”方法的“RL”方法的每个升级网络的升级到“L”系统。我们用“R”的“L”方法来避免将“最后的升级网络的升级和“L”的升级决定的升级,我们“L”的升级的“最后的“升级”方法的“升级的“的“升级”方法的“升级”的“的“的“升级”方法”的“升级”的“升级”方法,我们的“的“的“升级”方法”的“的“的“的“的“升级”方法”的“升级的“升级的“的“的“的”方法”方法”的“的“的“的”的“的“的”在”的“的”的“的“的”的“的“的“的“的“的”在”的”的“的”的”的“的”进行的”进行的“的”进行的”进行的”的”进行的“的”进行的”的“的“的“的”进行的”进行的”进行的“升级的“的”进行的“的“的“的“的“的”的”进行的”进行的”的“的“的”的”的“的“的”的“的“的“的”的”的”的“的“的”的“的”的“