We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an a priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown constraint and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. Moreover, the way in which they exploit regularity assumptions about the constraint introduces an additional critical hyperparameter. In this paper, we propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate. Our approach is naturally applicable to continuous domains and does not require additional hyperparameters. We theoretically analyze the method and show that we do not violate the safety constraint with high probability and that we explore by learning about the constraint up to arbitrary precision. Empirical evaluations demonstrate improved data-efficiency and scalability.
翻译:我们认为,在不允许我们评估违反先天未知(安全)限制的参数的情况下,按顺序作出决定是一项任务。一个共同的做法是,先将高斯进程置于未知的制约之上,只允许在高度概率高的安全区域进行评价。目前大多数方法依赖于域的离散,不能直接扩大到连续的情况。此外,它们利用关于制约的常规性假设的方式又增加了一个临界的超参数。在本文中,我们提议了一个信息理论安全勘探标准,直接利用GP后方参数确定最知情的安全参数以进行评估。我们的方法自然适用于连续的域,不需要额外的双参数。我们从理论上分析该方法,并表明我们没有以很高的概率违反安全限制,我们通过学习任意精确的制约来探索。经验性评估表明数据效率和可扩展性得到了提高。