In causal bandit problems, the action set consists of interventions on variables of a causal graph. Several researchers have recently studied such bandit problems and pointed out their practical applications. However, all existing works rely on a restrictive and impractical assumption that the learner is given full knowledge of the causal graph structure upfront. In this paper, we develop novel causal bandit algorithms without knowing the causal graph. Our algorithms work well for causal trees, causal forests and a general class of causal graphs. The regret guarantees of our algorithms greatly improve upon those of standard multi-armed bandit (MAB) algorithms under mild conditions. Lastly, we prove our mild conditions are necessary: without them one cannot do better than standard MAB bandit algorithms.
翻译:在因果图表的变量方面,行动集包括了对因果图表变量的干预。一些研究人员最近研究了此类土匪问题,并指出了它们的实际应用。然而,所有现有的工程都依赖于一种限制性和不切实际的假设,即学习者事先充分了解因果图表结构。在本文中,我们开发了新的因果土算法,而不知道因果图表。我们的算法在因果树、因果森林和一般因果图表类别方面运作良好。我们算法的遗憾保证在温和条件下大大改善了标准的多臂土匪算法。最后,我们证明我们的温和条件是必要的:没有这些算法,就无法比标准的MAB土匪算法做得更好。