We provide the first generalization error analysis for black-box learning through derivative-free optimization. Under the assumption of a Lipschitz and smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS) algorithm, that updates a $d$-dimensional model by replacing stochastic gradient directions with stochastic differences of $K+1$ perturbed loss evaluations per dataset (example) query. For both unbounded and bounded possibly nonconvex losses, we present the first generalization bounds for the ZoSS algorithm. These bounds coincide with those for SGD, and rather surprisingly are independent of $d$, $K$ and the batch size $m$, under appropriate choices of a slightly decreased learning rate. For bounded nonconvex losses and a batch size $m=1$, we additionally show that both generalization error and learning rate are independent of $d$ and $K$, and remain essentially the same as for the SGD, even for two function evaluations. Our results extensively extend and consistently recover established results for SGD in prior work, on both generalization bounds and corresponding learning rates. If additionally $m=n$, where $n$ is the dataset size, we derive generalization guarantees for full-batch GD as well.
翻译:我们为黑盒通过无衍生物优化学习提供了第一个通用错误分析。 在假设Lipschitz 和平滑的未知损失的情况下,我们考虑了零级Stochastistic Stochatic Search(ZosSS)算法,该算法在适当选择略微降低学习率的情况下,用随机偏差差差(K+1美元)每个数据集(例数)查询,用随机偏差(K+1美元)来取代随机梯度方向,更新了美元基数模型。对于无约束和受约束(可能不受约束的)非集装箱损失评价,我们为ZoSS算法提供了第一个通用界限。这些界限与SGD的界限相吻合,令人惊讶的是,根据适当选择略微降低的学习率,这些界限是美元和批量值(美元),我们进一步扩展并持续恢复了SGGD在先前工作中的既定结果。