Stochastic Lipschitz bandit algorithms balance exploration and exploitation, and have been used for a variety of important task domains. In this paper, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and arm-space. Due to this flexibility, the algorithm is able to efficiently optimize rewards and minimize regret, by focusing on the portions of the space that are most relevant. In our analysis, we link tree-based methods to Gaussian processes. In light of our analysis, we design a novel hierarchical Bayesian model for Lipschitz bandit problems. Our experiments show that our algorithms can achieve state-of-the-art performance in challenging real-world tasks such as neural network hyperparameter tuning.
翻译:Stochastic Lipschitz 土匪算法平衡了勘探和开发,并被用于各种重要任务领域。 在本文中,我们提出了一个利普西茨土匪方法框架,用于适应性地学习上下文和手臂空间的分隔。由于这种灵活性,这种算法能够有效地优化奖励和减少遗憾,方法是集中关注最相关的空间部分。在我们的分析中,我们把基于树的方法与高西亚进程联系起来。根据我们的分析,我们设计了一个新型的Bayesian 等级模型,用于解决利普西茨土匪问题。我们的实验表明,我们的算法可以在神经网络超分光仪调等挑战现实世界的任务中实现最先进的性能。