对多武装匪徒积极推断的实证评价 (An empirical evaluation of active inference in multi-armed bandits)

A key feature of sequential decision making under uncertainty is a need to balance between exploiting--choosing the best action according to the current knowledge, and exploring--obtaining information about values of other actions. The multi-armed bandit problem, a classical task that captures this trade-off, served as a vehicle in machine learning for developing bandit algorithms that proved to be useful in numerous industrial applications. The active inference framework, an approach to sequential decision making recently developed in neuroscience for understanding human and animal behaviour, is distinguished by its sophisticated strategy for resolving the exploration-exploitation trade-off. This makes active inference an exciting alternative to already established bandit algorithms. Here we derive an efficient and scalable approximate active inference algorithm and compare it to two state-of-the-art bandit algorithms: Bayesian upper confidence bound and optimistic Thompson sampling, on two types of bandit problems: a stationary and a dynamic switching bandit. Our empirical evaluation shows that the active inference algorithm does not produce efficient long-term behaviour in stationary bandits. However, in more challenging switching bandit problem active inference performs substantially better than the two bandit algorithms. The results open exciting venues for further research in theoretical and applied machine learning, as well as lend additional credibility to active inference as a general framework for studying human and animal behaviour.

翻译：在不确定的情况下进行顺序决策的一个关键特征是,需要平衡兼顾根据现有知识利用选择最佳行动与探索获取其他行动价值的信息之间的平衡。多武装土匪问题是一个古典任务,它捕捉了这种交易,成为机器学习发展土匪算法的工具,在许多工业应用中证明是有益的。积极的推论框架,即神经科学中最近为了解人类和动物行为而开发的顺序决策方法,其特点是其解决勘探-开发交易的复杂战略。这使得积极推论成为已经建立的土匪算法的一种令人兴奋的替代物。我们在这里产生了一种高效和可伸缩的近似积极推算法,并将其与两种最先进的土匪算法作比较:巴伊西亚高层信任约束和乐观的汤普森抽样,涉及两类土匪问题:定点和动态转换土匪行为。我们的经验评估表明,积极的推论算法不会在固定土匪中产生有效的长期行为。然而,在更具有挑战性的转换土匪算法问题方面,我们提出了一种更具有挑战性的积极推动性的积极推动性的积极推敲测算法,作为更深入的机法,作为更深入的研究基础,在两种机级分析中进行了更多的研究,作为进一步研究,作为更深入的推导的推理法,作为进一步的推论,作为进一步的推论,作为进一步的推导的推论,作为,作为,作为进一步的推论,作为更动的机的推论,作为进一步的推论,作为进一步的推论,在人类的推论,作为进一步的推论的推论,作为进一步的推理法的推理法,作为更的推理法的推论,在更,在更,作为,在更精确的推论,作为,在更精确的推论,在人类的推理,在更进一步,在人类的推理法的推理学的推理,在较,作为,作为,作为,作为,作为,作为,在较的推论,在较,作为,作为,在较的推论的推论的推论的推论,作为,作为,作为,作为,在更的推理,在更,在较的推论中,在更推理的推理的推论的推理学的推理,在更进一步,在更进一步的推理学的推理,作为,在