An adversarial bandit problem with memory constraints is studied where only the statistics of a subset of arms can be stored. A hierarchical learning policy that requires only a sublinear order of memory space in terms of the number of arms is developed. Its sublinear regret orders with respect to the time horizon are established for both weak regret and shifting regret. This work appears to be the first on memory-constrained bandit problems under the adversarial setting.
翻译:研究记忆受限的对抗性土匪问题,只储存一组武器的统计数字; 制定等级学习政策,只要求从武器数量方面分线的记忆空间; 确定对时间跨度的亚线性遗憾令,既包括微弱的遗憾,也包括转移的遗憾; 这项工作似乎是在对抗性环境下关于记忆受限的土匪问题的第一项工作。