Optimizing noisy functions online, when evaluating the objective requires experiments on a deployed system, is a crucial task arising in manufacturing, robotics and many others. Often, constraints on safe inputs are unknown ahead of time, and we only obtain noisy information, indicating how close we are to violating the constraints. Yet, safety must be guaranteed at all times, not only for the final output of the algorithm. We introduce a general approach for seeking a stationary point in high dimensional non-linear stochastic optimization problems in which maintaining safety during learning is crucial. Our approach called LB-SGD is based on applying stochastic gradient descent (SGD) with a carefully chosen adaptive step size to a logarithmic barrier approximation of the original problem. We provide a complete convergence analysis of non-convex, convex, and strongly-convex smooth constrained problems, with first-order and zeroth-order feedback. Our approach yields efficient updates and scales better with dimensionality compared to existing approaches. We empirically compare the sample complexity and the computational cost of our method with existing safe learning approaches. Beyond synthetic benchmarks, we demonstrate the effectiveness of our approach on minimizing constraint violation in policy search tasks in safe reinforcement learning (RL).
翻译:在评估目标时,优化在线的噪音功能需要对部署的系统进行实验,这是在制造、机器人和其他许多系统中产生的一项关键任务。通常,对安全投入的限制是事先不为人知的,我们只获得噪音信息,表明我们是否接近于违反限制。然而,必须始终保证安全,而不仅仅是算法的最终产出。我们采用一种总的办法,在高维非线性非线性优化问题中寻找一个固定点,在学习期间维护安全至关重要。我们称为LB-SGD 的办法,是以谨慎选择的随机梯度下降(SGD)为基础,对原始问题的对对齐障碍近度进行调整步骤大小。我们提供了对非convex、 convex和强convex平稳制约问题的完全趋同性分析,先行和零线性反馈。我们的方法比现有办法的维度提高了有效的更新和尺度。我们用实验性比较了我们方法的抽样复杂性和计算成本与现有安全学习方法之间的比较。超越合成基准,我们展示了我们在安全研究中减少违反要求的方法的有效性。