In combinatorial causal bandits (CCB), the learning agent chooses a subset of variables in each round to intervene and collects feedback from the observed variables to minimize expected regret or sample complexity. Previous works study this problem in both general causal models and binary generalized linear models (BGLMs). However, all of them require prior knowledge of causal graph structure. This paper studies the CCB problem without the graph structure on binary general causal models and BGLMs. We first provide an exponential lower bound of cumulative regrets for the CCB problem on general causal models. To overcome the exponentially large space of parameters, we then consider the CCB problem on BGLMs. We design a regret minimization algorithm for BGLMs even without the graph skeleton and show that it still achieves $O(\sqrt{T}\ln T)$ expected regret. This asymptotic regret is the same as the state-of-art algorithms relying on the graph structure. Moreover, we sacrifice the regret to $O(T^{\frac{2}{3}}\ln T)$ to remove the weight gap covered by the asymptotic notation. At last, we give some discussions and algorithms for pure exploration of the CCB problem without the graph structure.
翻译:在组合因果强盗(CCB)中,学习代理商在每个回合中选择一组变量,以便进行干预,并从观察到的变量中收集反馈,以尽量减少预期的遗憾或样本复杂性。以前的工作在一般因果模型和二元通用线性模型(BGLMs)中研究这一问题。然而,所有这些模型都需要事先了解因果图结构。本文研究CCB问题,没有二进制因果模型和BGLMs的图形结构。我们首先为CCCB在一般因果模型上的问题提供了指数性较低的累积遗憾组合。为了克服指数性巨大的参数空间,我们然后考虑BGLMs的CCBCB问题。我们为BGLMs设计了一种遗憾最小化算法,即使没有图形骨架,也表明它仍然能达到因果图结构的$O(sqrt{T ⁇ NT) 。这与依赖图形结构的状态算法相同。此外,我们将遗憾降为$O(T ⁇ frac{%3 ⁇ 3 ⁇ n T) 来消除Symptromalogystrical exisal dalmagalslationslate dalsislate 。