We consider Bayesian optimization in settings where observations can be adversarially biased, for example by an uncontrolled hidden confounder. Our first contribution is a reduction of the confounded setting to the dueling bandit model. Then we propose a novel approach for dueling bandits based on information-directed sampling (IDS). Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees. Our analysis further generalizes a previously proposed semi-parametric linear bandit model to non-linear reward functions, and uncovers interesting links to doubly-robust estimation.
翻译:我们认为,在观测可能存在对抗性偏差的环境中,例如由不受控制的隐蔽混淆者进行观测时,贝叶斯优化是最佳的。我们的第一个贡献是减少对决盗匪模式的混乱设置。然后,我们提出了一个基于信息导向抽样的新颖方法(IDS ) 。 因此,我们获得了对决盗的首个高效内分泌算法,配有累积的遗憾保证。 我们的分析进一步将先前提出的半参数线性盗匪模式概括为非线性奖励功能,并发现了与双曲线估计的有趣联系。