Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and is designed for stochastic non-convex optimization with a possibly non-convex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design, where a learning graph is designed to describe the required interaction relationship among agents in distributed learning. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.
翻译:最近推出的零序优化(ZOO)算法在分布式强化学习(RL)中显示了其效用。 不幸的是,在梯度估算过程中,几乎所有的算法都需要与全球变量和(或)要求对全球成本功能进行评估的相同层面的随机抽样,这可能导致大型网络的高度估计差异。在本文件中,我们提出一种新的零序算法,利用优化目标所固有的网络结构,使每个代理商能够独立地通过当地成本评估来估计其本地梯度,而不必使用任何协商一致的协议。提议的算法显示一种不同步的更新方案,目的是根据区块协调下降法,以可能非convex可行的域进行随机的非convex非convex优化。这一算法后来被用作分布式无模型的分布式RL算法,用于分布式的线性二次调节器设计,目的是描述分布式学习过程中各代理商之间的必要互动关系。我们从经验上验证了拟议的算法,以集中的 ZOO算法衡量其趋同率和差异。