Motivated by the high-frequency data streams continuously generated, real-time learning is becoming increasingly important. These data streams should be processed sequentially with the property that the data stream may change over time. In this streaming setting, we propose techniques for minimizing convex objectives through unbiased estimates of their gradients, commonly referred to as stochastic approximation problems. Our methods rely on stochastic approximation algorithms because of their applicability and computational advantages. The reasoning includes iterate averaging that guarantees optimal statistical efficiency under classical conditions. Our non-asymptotic analysis shows accelerated convergence by selecting the learning rate according to the expected data streams. We show that the average estimate converges optimally and robustly for any data stream rate. In addition, noise reduction can be achieved by processing the data in a specific pattern, which is advantageous for large-scale machine learning problems. These theoretical results are illustrated for various data streams, showing the effectiveness of the proposed algorithms.
翻译:以不断生成的高频数据流为动力,实时学习变得日益重要。这些数据流应随着数据流随时间变化而变化的属性相继处理。在这个流环境中,我们提出通过对其梯度(通常称为随机近似问题)进行不偏袒的估计来尽量减少曲线目标的方法。我们的方法依靠随机近似算法,因为其适用性和计算优势。推理包括连续平均保证在古典条件下统计效率最佳。我们的非抽取分析显示,根据预期数据流选择学习率会加速趋同。我们显示,平均估计值对任何数据流速率都是最佳和稳健的汇合。此外,通过以特定模式处理数据可以减少噪音,这有利于大型机器学习问题。这些理论结果用于各种数据流,显示提议的算法的有效性。