We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is $\delta$-smooth, our algorithm guarantees convergence at a rate matching stochastic gradient descent on a $\delta$-smooth objective, which can lead to substantially better sample efficiency. Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.
翻译:我们提出了一种算法,通过使用相关的、易于访问的函数作为代理,来最小化具有难以计算梯度的目标。我们的算法基于对代理的近似近端点迭代,结合相对较少的来自目标的随机梯度。当目标与代理之间的差异是$\delta$-平滑时,我们的算法可以保证以与随机梯度下降相匹配的速度收敛,从而可以实现显着更好的样本效率。我们的算法在机器学习中有许多潜在的应用,并提供了一种利用合成数据、物理模拟器、混合公共和私有数据等的原则方法。