In combinatorial causal bandits (CCB), the learning agent chooses at most $K$ variables in each round to intervene, collects feedback from the observed variables, with the goal of minimizing expected regret on the target variable $Y$. We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models. We present the algorithm BGLM-OFU for Markovian BGLMs (i.e. no hidden variables) based on the maximum likelihood estimation method, and show that it achieves $O(\sqrt{T}\log T)$ regret, where $T$ is the time horizon. For the special case of linear models with hidden variables, we apply causal inference techniques such as the do-calculus to convert the original model into a Markovian model, and then show that our BGLM-OFU algorithm and another algorithm based on the linear regression both solve such linear models with hidden variables. Our novelty includes (a) considering the combinatorial intervention action space and the general causal models including ones with hidden variables, (b) integrating and adapting techniques from diverse studies such as generalized linear bandits and online influence maximization, and (c) avoiding unrealistic assumptions (such as knowing the joint distribution of the parents of $Y$ under all interventions) and regret factors exponential to causal graph size in prior studies.
翻译:在组合因果强盗(CCB)中,学习代理商在每轮中最多选择K$的变量来干预,从观察到的变量中收集反馈,目的是最大限度地减少对目标变量Y美元的预期遗憾。我们在二元通用线性模型(BGLM)背景下研究,对因果模型进行简明的参数描述。我们为Markovian BGLM-OFU提供了基于最大可能性估计方法的BGLM-OFU算法(即没有隐藏变量),并显示它达到了O(sqrt{T ⁇ log T)美元的遗憾,即T$是时间跨度。对于隐藏变量的线性模型的特殊案例,我们运用了诸如 do-计算法等因果推论技术,将原模型转换为Markovian模型,然后显示我们的BGLM-OFU算法和基于线性回归算法的另一种算法都用隐藏变量解决了这种线性模型。我们的新概念包括(a)考虑组合干预空间和一般因果模型,包括隐藏变量和直隐性变量的因果模型,例如了解直隐性变型变型模型,并了解前的直线性模型,即了解直线性模型,将所有直线性模型,并修改和直线性模型,例如了解直线性模型,了解直线性模型,了解所有直线性模型,例如了解直线性模型,并了解直线性模型的直线性模型,并了解直线性模型,了解所有直线性模型,了解直线性模型,了解直线性模型,了解直线性模型,例如了解直线性模型,了解直线性推,了解直线性推论式模型。