In combinatorial causal bandits (CCB), the learning agent chooses at most $K$ variables in each round to intervene, collects feedback from the observed variables, with the goal of minimizing expected regret on the target variable $Y$. Different from all prior studies on causal bandits, CCB needs to deal with exponentially large action space. We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models. We present the algorithm BGLM-OFU for Markovian BGLMs (i.e. no hidden variables) based on the maximum likelihood estimation method, and show that it achieves $O(\sqrt{T}\log T)$ regret, where $T$ is the time horizon. For the special case of linear models with hidden variables, we apply causal inference techniques such as the do-calculus to convert the original model into a Markovian model, and then show that our BGLM-OFU algorithm and another algorithm based on the linear regression both solve such linear models with hidden variables. Our novelty includes (a) considering the combinatorial intervention action space and the general causal models including ones with hidden variables, (b) integrating and adapting techniques from diverse studies such as generalized linear bandits and online influence maximization, and (c) not relying on unrealistic assumptions such as knowing the joint distribution of the parents of $Y$ under all interventions used in some prior studies.
翻译:在组合性因果强盗(CCB)中,学习代理商在每轮中最多选择K$的变量来干预,从观察到的变量中收集反馈,目标是最大限度地减少对目标变量的预期遗憾(Y美元)。 CCB不同于以前对因果强盗的所有研究,需要处理指数性大型行动空间。我们在二元通用线性模型(BGLMM)中研究,对因果模型进行简明的参数表示。我们根据最大可能性估计方法为Markovian BGLM-OFUM(即没有隐藏变量)提供了BGLM-OFU算法和基于线性回归的另一种算法(即没有隐藏变量),并表明它达到了美元(sqrt{T ⁇ log T)的预期遗憾,而美元是时间范围。对于包含隐藏变量的线性模型的特殊案例,我们运用了因果推论技术,例如将原模型转换为Markovian模型,然后显示我们的BGLM-OFU的算法和基于线性回归法的另一种算法(即没有隐藏变量)的线性模型,我们的新颖式研究包括(abinalimalimal lial dial dialimalimestalimistial dial dial dial distration distration ex) imisal ex ex ex ex ex ex ex ex ex ex exismissation immismismismismismismismation immation imation imation imation imation imation imation imation exismation imation exismation ex ex ex ex ex ex impecuduction impecuductionsal immismismismism ex ex ex ex ex ex ex ex exism ex ex ex ex ex ex imation imation exual immismism imm imation imation ex imation imations immismismismismismismismismal ex ex