Efficient online decision-making in contextual bandits is challenging, as methods without informative priors often suffer from computational or statistical inefficiencies. In this work, we leverage pre-trained diffusion models as expressive priors to capture complex action dependencies and develop a practical algorithm that efficiently approximates posteriors under such priors, enabling both fast updates and sampling. Empirical results demonstrate the effectiveness and versatility of our approach across diverse contextual bandit settings.
翻译:在上下文赌博机中进行高效的在线决策具有挑战性,因为缺乏信息先验的方法通常会遭受计算或统计效率低下的问题。在本研究中,我们利用预训练的扩散模型作为表达力强的先验来捕捉复杂的动作依赖关系,并开发了一种实用算法,能够在此类先验下高效近似后验分布,从而实现快速更新与采样。实证结果表明,我们的方法在多种上下文赌博机场景中均展现出有效性和通用性。