We present a study of deep learning applied to the domain of network traffic data forecasting. This is a very important ingredient for network traffic engineering, e.g., intelligent routing, which can optimize network performance, especially in large networks. In a nutshell, we wish to predict, in advance, the bit rate for a transmission, based on low-dimensional connection metadata ("flows") that is available whenever a communication is initiated. Our study has several genuinely new points: First, it is performed on a large dataset (~50 million flows), which requires a new training scheme that operates on successive blocks of data since the whole dataset is too large for in-memory processing. Additionally, we are the first to propose and perform a more fine-grained prediction that distinguishes between low, medium and high bit rates instead of just "mice" and "elephant" flows. Lastly, we apply state-of-the-art visualization and clustering techniques to flow data and show that visualizations are insightful despite the heterogeneous and non-metric nature of the data. We developed a processing pipeline to handle the highly non-trivial acquisition process and allow for proper data preprocessing to be able to apply DNNs to network traffic data. We conduct DNN hyper-parameter optimization as well as feature selection experiments, which clearly show that fine-grained network traffic forecasting is feasible, and that domain-dependent data enrichment and augmentation strategies can improve results. An outlook about the fundamental challenges presented by network traffic analysis (high data throughput, unbalanced and dynamic classes, changing statistics, outlier detection) concludes the article.
翻译:我们对网络交通数据预报领域进行深层次学习研究,这是网络交通工程的一个非常重要的组成部分,例如智能路由,它能够优化网络性能,特别是在大型网络中。简言之,我们希望提前预测一个传输率的比特率,以通信启动时即可获得的低维连接元数据(“流量”)为基础。我们的研究有几个真正的新点:首先,它是在一个大型数据集上进行的(~5,000万流量),这需要一个新的培训计划,在连续数据块上运行,因为整个数据集太庞大,无法进行模拟处理。此外,我们首先提出并进行更精细的预测,将低、中和高位传输率区分开来,而不是仅仅“喜”和“幻影”流动。最后,我们对流动数据采用最先进的直观和组合技术,并表明尽管数据具有兼容性和非计量性质,但可洞察地显示可直观性。我们开发了一个处理高度非三端的交通量直径直径直线路线,用以处理高端直径直径直径直径直径的网络探测统计。我们首先提出高端的网络数据分析,然后将数据结果显示精确的轨道数据分析,然后将数据结果显示,再精确的网络进行。