The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a solution to an equation of the form $\mathbf{f}(\boldsymbol{\theta}) = \mathbf{0}$ where $\mathbf{f} : \mathbb{R}^d \rightarrow \mathbb{R}^d$, when only noisy measurements of $\mathbf{f}(\cdot)$ are available. In the literature to date, one can make a distinction between "synchronous" updating, whereby the entire vector of the current guess $\boldsymbol{\theta}_t$ is updated at each time, and "asynchronous" updating, whereby ony one component of $\boldsymbol{\theta}_t$ is updated. In convex and nonconvex optimization, there is also the notion of "batch" updating, whereby some but not all components of $\boldsymbol{\theta}_t$ are updated at each time $t$. In addition, there is also a distinction between using a "local" clock versus a "global" clock. In the literature to date, convergence proofs when a local clock is used make the assumption that the measurement noise is an i.i.d\ sequence, an assumption that does not hold in Reinforcement Learning (RL). In this note, we provide a general theory of convergence for batch asymchronous stochastic approximation (BASA), that works whether the updates use a local clock or a global clock, for the case where the measurement noises form a martingale difference sequence. This is the most general result to date and encompasses all others.
翻译:快速近似( SSA) 算法是一种广泛使用的概率法, 用于寻找一个公式的解决方案 $\ mathbbf{f} (\\ boldsymbol_theta}}) =\ mathbf{0} =\ mathbf{f} $\ mathb{f} :\ mathbb{rightrow\ mathbb{R ⁇ d$, 当只有 $\ mathbf{f} (\ cdot) 的噪音测量值时, 算出一种方法。 在迄今的文献中, 人们可以区分“ 同步” 更新的方程式, 即当前猜测 $\ boldsybol_thta} 的整个矢量器 =\\ $, 而“ 自动同步” 更新的计算法是“ 美元= boldsylocktlock ” 。 在本文中, 将“ slocaldal dock” 的计算法是“ a a caldealdealdeal date” a a cladealde, lade a a clade, ex a clax a clax a clade a clade a clad.