We consider the problem of sequentially maximising an unknown function over a set of actions while ensuring that every sampled point has a function value below a given safety threshold. We model the function using kernel-based and Gaussian process methods, while differing from previous works in our assumption that the function is monotonically increasing with respect to a safety variable. This assumption is motivated by various practical applications such as adaptive clinical trial design and robotics. Taking inspiration from the GP-UCB and SafeOpt algorithms, we propose an algorithm, monotone safe UCB (M-SafeUCB) for this task. We show that M-SafeUCB enjoys theoretical guarantees in terms of safety, a suitably-defined regret notion, and approximately finding the entire safe boundary. In addition, we illustrate that the monotonicity assumption yields significant benefits in terms of both the guarantees obtained and the algorithmic simplicity. We support our theoretical findings by performing empirical evaluations on a variety of functions.
翻译:我们考虑在一系列行动上按顺序使未知功能最大化的问题,同时确保每个抽样点的功能值低于给定的安全阈值。我们用内核和高斯进程方法来模拟该功能,但与我们先前假设该功能在安全变量方面单质增长的假设有所不同。这一假设的动机是适应性临床试验设计和机器人等各种实际应用。我们从GP-UCB和安全操作算法中得到灵感,为此我们提议一种算法,即单体安全UCB(M-SafeUCB),我们表明M-SafeUCB在安全、适当定义的遗憾概念和大致找到整个安全边界方面享有理论保障。此外,我们指出,从获得的保证和算法简单两方面来看,单一性假设产生重大利益。我们支持我们的理论结论,对各种功能进行经验评估。