Efficient prediction of internet traffic is essential for ensuring proactive management of computer networks. Nowadays, machine learning approaches show promising performance in modeling real-world complex traffic. However, most existing works assumed that model training and evaluation data came from identical distribution. But in practice, there is a high probability that the model will deal with data from a slightly or entirely unknown distribution in the deployment phase. This paper investigated and evaluated machine learning performances using eXtreme Gradient Boosting, Light Gradient Boosting Machine, Stochastic Gradient Descent, Gradient Boosting Regressor, CatBoost Regressor, and their stacked ensemble model using data from both identical and out-of distribution. Also, we proposed a hybrid machine learning model integrating wavelet decomposition for improving out-of-distribution prediction as standalone models were unable to generalize very well. Our experimental results show the best performance of the standalone ensemble model with an accuracy of 96.4%, while the hybrid ensemble model improved it by 1% for in-distribution data. But its performance dropped significantly when tested with three different datasets having a distribution shift than the training set. However, our proposed hybrid model considerably reduces the performance gap between identical and out-of-distribution evaluation compared with the standalone model, indicating the decomposition technique's effectiveness in the case of out-of-distribution generalization.
翻译:对互联网流量的有效预测对于确保计算机网络的积极管理至关重要。 如今, 机器学习方法显示在模拟真实世界复杂流量方面有良好的业绩。 但是, 多数现有工作假设模型培训和评价数据来自相同的分布。 但实际上, 模型极有可能处理部署阶段中略微或完全不为人所知的分布数据。 本文调查并评价了机器学习业绩, 使用了 exxtreme Great-Boting、 轻度加速推动机、 微小梯度梯子、 渐进式推力回器、 CatBoost Regresor 及其堆叠式组合式模型, 使用来自相同和超分布的数据。 此外, 我们提议采用混合机器学习模型, 将波列分解变法作为独立模型, 来改进分配的分布预测。 我们的实验结果显示, 独立组合组合式模型的最佳表现为96.4%, 混合组合组合式模型将其分配数据差距提高1%。 但是, 当用三种不同的混合数据分配方法测试时, 其业绩显著下降, 与拟议混合性分析模型相比, 降低了业绩分配方法的变化 。