Active learning methods have shown great promise in reducing the number of samples necessary for learning. As automated learning systems are adopted into real-time, real-world decision-making pipelines, it is increasingly important that such algorithms are designed with safety in mind. In this work we investigate the complexity of learning the best safe decision in interactive environments. We reduce this problem to a constrained linear bandits problem, where our goal is to find the best arm satisfying certain (unknown) safety constraints. We propose an adaptive experimental design-based algorithm, which we show efficiently trades off between the difficulty of showing an arm is unsafe vs suboptimal. To our knowledge, our results are the first on best-arm identification in linear bandits with safety constraints. In practice, we demonstrate that this approach performs well on synthetic and real world datasets.
翻译:积极的学习方法在减少学习所需样本数量方面显示了巨大的希望。随着自动学习系统被采用到实时的、现实世界的决策管道中,这种算法在设计时要以安全为思想,这一点越来越重要。在这项工作中,我们调查在互动环境中学习最安全决定的复杂性。我们将此问题降低到一个有限的线性土匪问题,我们的目标是找到能满足某些(未知的)安全限制的最佳手臂。我们提出了一种适应性实验性设计算法,我们用这个算法来有效地权衡显示显示显示手臂显示困难的难度是不安全的,而不是不完美的。据我们所知,我们的成果是第一个在安全限制的线性土匪中进行最佳武器识别的结果。在实践中,我们证明这种方法在合成和真实的世界数据集方面表现良好。