Zeroth-order (ZO) optimization is widely used to handle challenging tasks, such as query-based black-box adversarial attacks and reinforcement learning. Various attempts have been made to integrate prior information into the gradient estimation procedure based on finite differences, with promising empirical results. However, their convergence properties are not well understood. This paper makes an attempt to fill up this gap by analyzing the convergence of prior-guided ZO algorithms under a greedy descent framework with various gradient estimators. We provide a convergence guarantee for the prior-guided random gradient-free (PRGF) algorithms. Moreover, to further accelerate over greedy descent methods, we present a new accelerated random search (ARS) algorithm that incorporates prior information, together with a convergence analysis. Finally, our theoretical results are confirmed by experiments on several numerical benchmarks as well as adversarial attacks.
翻译:零顺序(ZO)优化被广泛用于处理具有挑战性的任务,例如基于询问的黑盒对抗性攻击和强化学习,已经作出各种努力,根据有限的差异,将先前的信息纳入梯度估计程序,并取得有希望的经验结果,然而,它们的趋同特性并没有得到很好理解。本文件试图通过分析在贪婪的血统框架下与各种梯度测量员的先前引导的ZO算法的趋同来弥补这一差距。我们为先前引导的随机梯度(PRGF)算法提供了趋同保证。此外,为了进一步加速克服贪婪的下降方法,我们提出了一种新的加速随机搜索算法,其中包括了先前的信息,并进行了趋同分析。最后,我们理论结果通过若干数字基准的实验以及对抗性攻击得到证实。