We consider the problem of decision-making under uncertainty in an environment with safety constraints. Many business and industrial applications rely on real-time optimization to improve key performance indicators. In the case of unknown characteristics, real-time optimization becomes challenging, particularly because of the satisfaction of safety constraints. We propose the ARTEO algorithm, where we cast multi-armed bandits as a mathematical programming problem subject to safety constraints and learn the unknown characteristics through exploration while optimizing the targets. We quantify the uncertainty in unknown characteristics by using Gaussian processes and incorporate it into the cost function as a contribution which drives exploration. We adaptively control the size of this contribution in accordance with the requirements of the environment. We guarantee the safety of our algorithm with a high probability through confidence bounds constructed under the regularity assumptions of Gaussian processes. We demonstrate the safety and efficiency of our approach with two case studies: optimization of electric motor current and real-time bidding problems. We further evaluate the performance of ARTEO compared to a safe variant of upper confidence bound based algorithms. ARTEO achieves less cumulative regret with accurate and safe decisions.
翻译:许多商业和工业应用都依靠实时优化来改进关键业绩指标。在未知特点的情况下,实时优化变得具有挑战性,特别是因为安全限制的满足性。我们建议采用ARTEO算法,将多武装土匪作为一种数学编程问题,但需受安全限制,通过探索来了解未知特征,同时优化目标。我们利用高山程序来量化未知特征的不确定性,并将之纳入成本功能,作为驱动探索的一种贡献。我们根据环境要求适应性地控制这一贡献的规模。我们通过在高山程序正常假设下建立的信任界限,保证我们的算法安全性。我们通过两个案例研究展示我们的方法的安全和效率:优化电动电动电流和实时投标问题。我们进一步评估高信任制算法的性能和高信任制算法的安全变异性。我们通过准确和安全的决定来减少累积的遗憾。