Structured stochastic multi-armed bandits provide accelerated regret rates over the standard unstructured bandit problems. Most structured bandits, however, assume the knowledge of the structural parameter such as Lipschitz continuity, which is often not available. To cope with the latent structural parameter, we consider a transfer learning setting in which an agent must learn to transfer the structural information from the prior tasks to the next task, which is inspired by practical problems such as rate adaptation in wireless link. We propose a novel framework to provably and accurately estimate the Lipschitz constant based on previous tasks and fully exploit it for the new task at hand. We analyze the efficiency of the proposed framework in two folds: (i) the sample complexity of our estimator matches with the information-theoretic fundamental limit; and (ii) our regret bound on the new task is close to that of the oracle algorithm with the full knowledge of the Lipschitz constant under mild assumptions. Our analysis reveals a set of useful insights on transfer learning for latent Lipschitzconstants such as the fundamental challenge a learner faces. Our numerical evaluations confirm our theoretical findings and show the superiority of the proposed framework compared to baselines.
翻译:结构性的多武装强盗对标准的无结构强盗问题提供了加速的遗憾率。然而,大多数有结构的强盗都假定了结构参数的知识,如通常不具备的Lipschitz连续性。为了应对潜在的结构参数,我们考虑一个转移学习环境,使代理人必须学会将结构信息从先前的任务转移到下一个任务,这是由无线链接的速率调整等实际问题所启发的。我们提出了一个新框架,以便根据以往的任务对Lipschitz常数作出可辨和准确的估计,并充分利用它来完成新的任务。我们用两个折叠来分析拟议框架的效率:(一) 我们估算器的抽样复杂性与信息理论基本限制相吻合;(二) 我们对于新任务感到的遗憾,与灵巧算法的完全知识相近,而Lipsschitz常数则以温和的假设为基础。我们的分析揭示了一套有用的洞察力,了解潜潜伏的Lipschitzconts,如一个学习者面的基本挑战。我们的数字评价证实了我们的理论结论,并显示了拟议框架的优越性与基线比较。