用于黑盒学习和控制的新的单点残余后退后甲骨甲骨架 (A New One-Point Residual-Feedback Oracle For Black-Box Learning and Control)

Zeroth-order optimization (ZO) algorithms have been recently used to solve black-box or simulation-based learning and control problems, where the gradient of the objective function cannot be easily computed but can be approximated using the objective function values. Many existing ZO algorithms adopt two-point feedback schemes due to their fast convergence rate compared to one-point feedback schemes. However, two-point schemes require two evaluations of the objective function at each iteration, which can be impractical in applications where the data are not all available a priori, e.g., in online optimization. In this paper, we propose a novel one-point feedback scheme that queries the function value once at each iteration and estimates the gradient using the residual between two consecutive points. When optimizing a deterministic Lipschitz function, we show that the query complexity of ZO with the proposed one-point residual feedback matches that of ZO with the existing two-point schemes. Moreover, the query complexity of the proposed algorithm can be improved when the objective function has Lipschitz gradient. Then, for stochastic bandit optimization problems where only noisy objective function values are given, we show that ZO with one-point residual feedback achieves the same convergence rate as that of two-point scheme with uncontrollable data samples. We demonstrate the effectiveness of the proposed one-point residual feedback via extensive numerical experiments.

翻译：最近利用零点命令优化算法来解决黑盒或模拟学习和控制问题,其中目标功能的梯度无法轻易计算,但可以用客观功能值来比较。许多现有的ZO算法采用两点反馈计划,因为与一点反馈计划相比,它们快速趋同率是两点反馈计划。然而,两点计划要求对每个迭代的客观功能进行两次评价,如果数据并非全部具有先验性,在应用程序中可能是不切实际的,例如在网上优化方面。在本文件中,我们提议一个新型的一点反馈计划,在每次迭代中查询一次功能值,并利用连续两个点之间的剩余值估算梯度。在优化确定性利普施茨函数时,我们表明,ZO与拟议的一点剩余反馈计划的质复杂性与现有的两点计划相匹配。此外,如果目标功能具有利普西茨梯度梯度,则拟议的残余值的查询复杂性可以提高。然后,在每次迭接强一点时,在每次迭接强点时,通过两个连续点目标值来测测测测测测测测测测,我们所给出了同一点的数值,从而显示,我们以同一点数点对一点的数值表示,我们展示了同一点的数值的精确点的数值。