We consider unconstrained stochastic optimization problems with no available gradient information. Such problems arise in settings from derivative-free simulation optimization to reinforcement learning. We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a stochastic function using finite differences within a common random number framework. We develop modified versions of a norm test and an inner product quasi-Newton test to control the sample sizes used in the stochastic approximations and provide global convergence results to the neighborhood of the optimal solution. We present numerical experiments on simulation optimization problems to illustrate the performance of the proposed algorithm. When compared with classical zeroth-order stochastic gradient methods, we observe that our strategies of adapting the sample sizes significantly improve performance in terms of the number of stochastic function evaluations required.
翻译:我们考虑了没有限制的随机优化问题,没有可用的梯度信息。这些问题出现在从无衍生物模拟优化到强化学习的环境下。我们建议采用适应性抽样准牛顿方法,在共同随机数框架内使用有限差异来估计随机函数的梯度。我们开发了规范测试的修改版本和内产品准牛顿测试,以控制随机近似中所使用的样本大小,并为最佳解决方案的周边提供全球趋同结果。我们提出了模拟优化问题的数字实验,以说明拟议算法的性能。与典型零级随机梯度方法相比,我们观察到,我们调整样本规模的战略在所需随机功能评估数量方面大大改进了绩效。