We study a sequential decision-making problem on a $n$-node graph $\mathcal{G}$ where each node has an unknown label from a finite set $\mathbf{\Omega}$, drawn from a joint distribution $\mathcal{P}$ that is Markov with respect to $\mathcal{G}$. At each step, selecting a node reveals its label and yields a label-dependent reward. The goal is to adaptively choose nodes to maximize expected accumulated discounted rewards. We impose a frontier exploration constraint, where actions are limited to neighbors of previously selected nodes, reflecting practical constraints in settings such as contact tracing and robotic exploration. We design a Gittins index-based policy that applies to general graphs and is provably optimal when $\mathcal{G}$ is a forest. Our implementation runs in $\mathcal{O}(n^2 \cdot |\mathbf{\Omega}|^2)$ time while using $\mathcal{O}(n \cdot |\mathbf{\Omega}|^2)$ oracle calls to $\mathcal{P}$ and $\mathcal{O}(n^2 \cdot |\mathbf{\Omega}|)$ space. Experiments on synthetic and real-world graphs show that our method consistently outperforms natural baselines, including in non-tree, budget-limited, and undiscounted settings. For example, in HIV testing simulations on real-world sexual interaction networks, our policy detects nearly all positive cases with only half the population tested, substantially outperforming other baselines.
翻译:我们研究一个在$n$节点图$\mathcal{G}$上的序贯决策问题,其中每个节点具有来自有限集合$\mathbf{\Omega}$的未知标签,该标签从与$\mathcal{G}$满足马尔可夫性的联合分布$\mathcal{P}$中抽取。在每一步,选择一个节点会揭示其标签并获得与标签相关的奖励。目标是通过自适应地选择节点来最大化期望累积折扣奖励。我们施加了前沿探索约束,即行动仅限于先前选择节点的邻居,这反映了接触者追踪和机器人探索等实际场景中的约束条件。我们设计了一种基于Gittins指数的策略,适用于一般图结构,并证明当$\mathcal{G}$为森林时该策略是最优的。我们的算法实现时间复杂度为$\mathcal{O}(n^2 \cdot |\mathbf{\Omega}|^2)$,同时使用$\mathcal{O}(n \cdot |\mathbf{\Omega}|^2)$次对$\mathcal{P}$的查询调用,空间复杂度为$\mathcal{O}(n^2 \cdot |\mathbf{\Omega}|)$。在合成图和真实世界图上的实验表明,我们的方法在包括非树结构、预算受限和无折扣设定在内的多种场景中均持续优于自然基线方法。例如,在基于真实世界性接触网络的HIV检测模拟中,我们的策略仅需检测一半人口即可发现几乎所有阳性病例,显著优于其他基线方法。