Gaussian 强盗学习的想象力 (Gaussian Imagination in Bandit Learning)

Assuming distributions are Gaussian often facilitates computations that are otherwise intractable. We study the performance of an agent that attains a bounded information ratio with respect to a bandit environment with a Gaussian prior distribution and a Gaussian likelihood function when applied instead to a Bernoulli bandit. Relative to an information-theoretic bound on the Bayesian regret the agent would incur when interacting with the Gaussian bandit, we bound the increase in regret when the agent interacts with the Bernoulli bandit. If the Gaussian prior distribution and likelihood function are sufficiently diffuse, this increase grows at a rate which is at most linear in the square-root of the time horizon, and thus the per-timestep increase vanishes. Our results formalize the folklore that so-called Bayesian agents remain effective when instantiated with diffuse misspecified distributions.

翻译：假设分配是Gaussian, 通常会便利本可难处理的计算。我们研究一个代理人的性能,该代理人在盗匪环境方面达到一个封闭的信息比率,先是Gaussian的分布,然后是Gaussian的概率函数,转而是Bernoulli的盗匪。相对于贝叶斯人悔恨上的信息理论,该代理人在与Gaussian的土匪互动时会产生的负数,当该代理人与Bernoulli土匪发生互动时,我们将增加的遗憾捆绑起来。如果Gaussian先前的分布和可能性功能足够分散,这种增加的速度会以时间范围平原的最多线性速度增长,因此,每步数会逐渐消失。我们的结果正式化了所谓的Bayesian的民俗,即所谓的巴伊斯人代理人在与扩散错误分配时仍然有效。