以最接近最小化 Oracle 优化装配点 (Saddle Point Optimization with Approximate Minimization Oracle)

A major approach to saddle point optimization $\min_x\max_y f(x, y)$ is a gradient based approach as is popularized by generative adversarial networks (GANs). In contrast, we analyze an alternative approach relying only on an oracle that solves a minimization problem approximately. Our approach locates approximate solutions $x'$ and $y'$ to $\min_{x'}f(x', y)$ and $\max_{y'}f(x, y')$ at a given point $(x, y)$ and updates $(x, y)$ toward these approximate solutions $(x', y')$ with a learning rate $\eta$. On locally strong convex--concave smooth functions, we derive conditions on $\eta$ to exhibit linear convergence to a local saddle point, which reveals a possible shortcoming of recently developed robust adversarial reinforcement learning algorithms. We develop a heuristic approach to adapt $\eta$ derivative-free and implement zero-order and first-order minimization algorithms. Numerical experiments are conducted to show the tightness of the theoretical results as well as the usefulness of the $\eta$ adaptation mechanism.

翻译：最优化 $\ min_ x\ max_ y f( x, y) 的主要方法是一种梯度法,这种方法被基因对抗网络( GANs) 所普及。相反,我们分析一种仅依赖能解决最大限度地最小化问题的神器的替代方法。我们的方法是找到大约的答案 $ $, $y $ 美元到$ min_ xx, y) 美元和 $\ max y} f( x, y) 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 以学习率 $, 美元。在当地强大的 convex- concave 光滑的功能上, 我们以 $ eta 美元为条件, 显示与当地马鞍的线性趋近点, 这表明最近开发的强力对抗性强化学习算法可能存在缺陷。我们开发了一种超度方法, 来调整 $\ detata imal- foral- yal- as the as the pressal- prefiltyleas.