We consider the problem of minimizing a convex function that is evolving according to unknown and possibly stochastic dynamics, which may depend jointly on time and on the decision variable itself. Such problems abound in the machine learning and signal processing literature, under the names of concept drift, stochastic tracking, and performative prediction. We provide novel non-asymptotic convergence guarantees for stochastic algorithms with iterate averaging, focusing on bounds valid both in expectation and with high probability. The efficiency estimates we obtain clearly decouple the contributions of optimization error, gradient noise, and time drift. Notably, we show that the tracking efficiency of the proximal stochastic gradient method depends only logarithmically on the initialization quality, when equipped with a step-decay schedule. Numerical experiments illustrate our results.
翻译:我们考虑如何最大限度地减少正在根据未知的和可能的随机动态变化而变化的共性功能的问题,这种动态可能同时取决于时间和决定变量本身。这些问题在机器学习和信号处理文献中很多,以概念漂移、随机跟踪和性能预测等名称出现。我们为中间偏移的共性算法提供了新的非非非无损的趋同保证,侧重于预期和高概率两方面都有效的界限。我们获得的效率估计明显脱钩了优化错误、梯度噪音和时间漂移等贡献。值得注意的是,我们表明,跟踪准偏差梯度方法的效率仅仅取决于初始化质量,只要配备了逐步衰减时间表。数字实验说明了我们的结果。