We study dynamic allocation problems for discrete time multi-armed bandits under uncertainty, based on the the theory of nonlinear expectations. We show that, under strong independence of the bandits and with some relaxation in the definition of optimality, a Gittins allocation index gives optimal choices. This involves studying the interaction of our uncertainty with controls which determine the filtration. We also run a simple numerical example which illustrates the interaction between the willingness to explore and uncertainty aversion of the agent when making decisions.
翻译:我们根据非线性期望理论,对不确定的离散时间多武装强盗的动态分配问题进行研究,我们表明,在强盗的强大独立性和最佳性定义有所放松的情况下,基廷斯分配指数提供了最佳选择。这涉及研究我们的不确定性与决定过滤的控制措施之间的相互作用。我们还有一个简单的数字例子,说明在代理人决策时探索的意愿和对不确定性的厌恶之间的相互作用。