Bayesian大脑和Rényi差异 (Bayesian brains and the Rényi divergence)

Under the Bayesian brain hypothesis, behavioural variations can be attributed to different priors over generative model parameters. This provides a formal explanation for why individuals exhibit inconsistent behavioural preferences when confronted with similar choices. For example, greedy preferences are a consequence of confident (or precise) beliefs over certain outcomes. Here, we offer an alternative account of behavioural variability using R\'enyi divergences and their associated variational bounds. R\'enyi bounds are analogous to the variational free energy (or evidence lower bound) and can be derived under the same assumptions. Importantly, these bounds provide a formal way to establish behavioural differences through an $\alpha$ parameter, given fixed priors. This rests on changes in $\alpha$ that alter the bound (on a continuous scale), inducing different posterior estimates and consequent variations in behaviour. Thus, it looks as if individuals have different priors, and have reached different conclusions. More specifically, $\alpha \to 0^{+}$ optimisation leads to mass-covering variational estimates and increased variability in choice behaviour. Furthermore, $\alpha \to + \infty$ optimisation leads to mass-seeking variational posteriors and greedy preferences. We exemplify this formulation through simulations of the multi-armed bandit task. We note that these $\alpha$ parameterisations may be especially relevant, i.e., shape preferences, when the true posterior is not in the same family of distributions as the assumed (simpler) approximate density, which may be the case in many real-world scenarios. The ensuing departure from vanilla variational inference provides a potentially useful explanation for differences in behavioural preferences of biological (or artificial) agents under the assumption that the brain performs variational Bayesian inference.

翻译：在巴耶斯大脑假设下,行为变化可归因于不同的前科,而不是基因化模型参数。这为个人在面对相似的选择时表现出不一致的行为偏好提供了正式的解释。例如, 贪婪偏好是某些结果的自信( 或精确) 信念的结果。在这里, 我们用R' enyi 的差异及其相关的变异界限来描述行为变异性。 R\ enyi 界限类似于变异自由能量( 或证据约束较低 ), 并且可以在相同的假设下得出。重要的是, 这些界限提供了一个正式的方式, 通过 $\ alpha$ 参数来建立行为差异, 以固定的前科。这取决于 $\ alpha 美元的变化, 以改变约束( 持续的规模), 诱导出不同的事后估计和随后的行为变化。因此, 更具体地说, $alphalphia 的优化, 最精确的变异性( 我们的货币化的变异性 ), 在模拟的变形中, 我们的货币变形的货币变变变。