Internet traffic in the real world is susceptible to various external and internal factors which may abruptly change the normal traffic flow. Those unexpected changes are considered outliers in traffic. However, deep sequence models have been used to predict complex IP traffic, but their comparative performance for anomalous traffic has not been studied extensively. In this paper, we investigated and evaluated the performance of different deep sequence models for anomalous traffic prediction. Several deep sequences models were implemented to predict real traffic without and with outliers and show the significance of outlier detection in real-world traffic prediction. First, two different outlier detection techniques, such as the Three-Sigma rule and Isolation Forest, were applied to identify the anomaly. Second, we adjusted those abnormal data points using the Backward Filling technique before training the model. Finally, the performance of different models was compared for abnormal and adjusted traffic. LSTM_Encoder_Decoder (LSTM_En_De) is the best prediction model in our experiment, reducing the deviation between actual and predicted traffic by more than 11\% after adjusting the outliers. All other models, including Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), LSTM_En_De with Attention layer (LSTM_En_De_Atn), Gated Recurrent Unit (GRU), show better prediction after replacing the outliers and decreasing prediction error by more than 29%, 24%, 19%, and 10% respectively. Our experimental results indicate that the outliers in the data can significantly impact the quality of the prediction. Thus, outlier detection and mitigation assist the deep sequence model in learning the general trend and making better predictions.
翻译:真实世界的互联网流量容易受到各种外部和内部因素的影响,这些因素可能会突然改变正常交通流量。 这些意外变化被视为交通量的外部外缘。 但是, 使用深序列模型来预测复杂的 IP 流量, 但是它们对于异常流量的比较性能还没有进行广泛的研究。 在本文中, 我们调查并评估了异常流量预测的不同深度序列模型的性能。 一些深度序列模型( LSTM_ En_ deoder) 是我们实验中的最佳预测模型, 在对外部值进行调整后, 实际和预测流量之间的偏差减少了11个百分点。 首先, 应用了两种不同的异端探测技术, 如三西格玛规则和隔离森林, 来识别异常点。 其次, 我们在培训模型之前使用向后倒填技术调整了这些异常数据点。 最后, 对异常流量预测的不同模型的性能进行了比较。 LSTM_ Encoder_Decoder (LTM_E_ERS) 是我们实验中最好的预测模型, 将实际和预测值之间的偏差减少幅度大于11个百分点。 所有其他模型, 都显示( Renalal_ IMS) IMIS IMLA_ real_ real_ real IM 。