In this paper, we study the strategic allocation of limited resources using a Colonel Blotto game (CBG) under a dynamic setting and analyze the problem using an online learning approach. In this model, one of the players is the learner who has limited troops to allocate over a finite time horizon, and the other player is an adversary. At each stage, the learner plays a Colonel Blotto game with the adversary and strategically determines the distribution of troops among battlefields based on past observations. The adversary chooses its allocation strategy randomly from some fixed distribution that is unknown to the learner. The learner's objective is to minimize its regret, which is the difference between the payoff of the best mixed strategy and the realized payoff by following a learning algorithm while not violating the budget constraint. The learning in dynamic CBG is analyzed under the framework of combinatorial bandit and bandit with knapsacks. We first convert the budget-constrained dynamic CBG to a path planning problem on a directed graph. We then devise an efficient algorithm that combines a special combinatorial bandit algorithm Edge for the path planning problem and a bandit with knapsack algorithm LagrangeBwK to cope with the budget constraint. The theoretical analysis shows that the learner's regret is bounded by a term sublinear in time horizon and polynomial in other parameters. Finally, we justify our theoretical results by performing simulations for various scenarios.
翻译:在本文中, 我们用一个动态环境的布洛托上校游戏( CBG) 来研究有限资源的战略分配, 并使用在线学习方法分析问题。 在这个模型中, 玩家之一是学习者, 其部队在有限的时间范围内分配有限, 而另一个玩家则是一个对手。 在每一个阶段, 学习者与对手玩布洛托上校游戏, 并根据过去的观察, 从战略上决定军队在战场之间的分配。 对手从一个学习者所不知道的固定分布中随机选择其分配战略。 学习者的目标是尽量减少其遗憾, 这是最佳混合战略的付款与通过学习算法而不是违反预算限制实现的付款之间的差异。 动态的CBBG学习者在组合式带宽带宽的带宽和带宽的带宽框架下, 用Knappsack背包来分析。 我们首先将预算限制的动态CBG转换成一个路径规划问题。 然后我们设计一个高效的算法, 结合一个特殊的拼图调手法, Edge 用于路径规划问题, 而一个带宽度的逻辑分析结果, 以Kmablegalmakeral lakealalalalalalalalmaksal lax lax lax lax lax lax lax lax lax