Host load prediction is essential for dynamic resource scaling and job scheduling in a cloud computing environment. In this context, workload prediction is challenging because of several issues. First, it must be accurate to enable precise scheduling decisions. Second, it must be fast to schedule at the right time. Third, a model must be able to account for new patterns of workloads so it can perform well on the latest and old patterns. Not being able to make an accurate and fast prediction or the inability to predict new usage patterns can result in severe outcomes such as service level agreement (SLA) misses. Our research trains a fast model with the ability of online adaptation based on the gated recurrent unit (GRU) to mitigate the mentioned issues. We use a multivariate approach using several features, such as memory usage, CPU usage, disk I/O usage, and disk space, to perform the predictions accurately. Moreover, we predict multiple steps ahead, which is essential for making scheduling decisions in advance. Furthermore, we use two pruning methods: L1 norm and random, to produce a sparse model for faster forecasts. Finally, online learning is used to create a model that can adapt over time to new workload patterns.
翻译:采用修减GRU神经网络进行高效在线主机工作量预测
主机负载预测对于云计算环境中的动态资源缩放和作业调度至关重要。在这种情况下,工作负载预测具有多个问题。首先,它必须准确,以便进行精确的调度决策。其次,它必须快速,以便在正确的时间进行调度。第三,模型必须能够考虑新的工作量模式,以便能够在最新和旧的模式上表现出色。不能进行准确和快速的预测或无法预测新的使用模式可能会导致严重后果,例如未达到服务级别协议(SLA)。我们的研究使用门控循环单元(GRU)进行快速模型训练,具有在线适应能力,以减轻上述问题。我们使用多变量方法,使用多个特征进行预测,例如内存使用情况,CPU使用情况,磁盘IO使用情况和磁盘空间,以进行准确的预测。此外,我们提前预测多个步骤对于提前做出调度决策至关重要。此外,我们使用两种修剪方法:L1范数和随机选择,以生成稀疏模型以进行更快的预测。最后,我们采用在线学习的方式创建模型,以便随着时间的推移,可以适应新的工作量模式。