Obtaining reliable, adaptive confidence sets for prediction functions (hypotheses) is a central challenge in sequential decision-making tasks, such as bandits and model-based reinforcement learning. These confidence sets typically rely on prior assumptions on the hypothesis space, e.g., the known kernel of a Reproducing Kernel Hilbert Space (RKHS). Hand-designing such kernels is error prone, and misspecification may lead to poor or unsafe performance. In this work, we propose to meta-learn a kernel from offline data (Meta-KeL). For the case where the unknown kernel is a combination of known base kernels, we develop an estimator based on structured sparsity. Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets that, with increasing amounts of offline data, become as tight as those given the true unknown kernel. We demonstrate our approach on the kernelized bandit problem (a.k.a.~Bayesian optimization), where we establish regret bounds competitive with those given the true kernel. We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
翻译:获得可靠、适应性的预测功能信心(假象)是连续决策任务(如土匪和基于模型的强化学习)的一个中心挑战。这些信心通常依赖于假设空间的先前假设,例如复制的Kernel Hilbert空间(RKHS)已知的内核。手设计这类内核容易出错,定型错误可能导致不良或不安全的性能。在这项工作中,我们提议从离线数据(Meta-KeL)中取出一个内核。对于未知内核是已知的基核结合的情况,我们根据结构上的宽度开发一个天线。在温和的条件下,我们保证我们估计的RKHS产生有效的信心,随着离线数据数量的增加,与真正未知的内核数据一样紧凑。我们展示了我们对内嵌式土问题(a.k.a.-~Bayesian 优化)的处理方法,我们在其中确立了与真正核心内核目标之间的遗憾界限。我们还以实证性地评估了我们BaySHSy 任务的有效性。