互联网流通量不是高斯文 -- -- 它们是日志-正常:对模拟和预测有影响的18年纵向研究(完整版本) (Internet Traffic Volumes Are Not Gaussian -- They Are Log-Normal: An 18-Year Longitudinal Study With Implications for Modelling and Prediction (Complete Version))

Better · 迹 · MoDELS · 统计量 · Networking ·

2021 年 2 月 12 日

Internet Traffic Volumes Are Not Gaussian -- They Are Log-Normal: An 18-Year Longitudinal Study With Implications for Modelling and Prediction (Complete Version)

翻译：互联网流通量不是高斯文 -- -- 它们是日志-正常:对模拟和预测有影响的18年纵向研究(完整版本)

Mohammed Alasmar,Richard Clegg,Nickolay Zakhleniuk,George Parisis

from arxiv, 15 pages, accepted for publication at IEEE/ACM Transactions on Networking. arXiv admin note: substantial text overlap with arXiv:1902.03853

Getting good statistical models of traffic on network links is a well-known, often-studied problem. A lot of attention has been given to correlation patterns and flow duration. The distribution of the amount of traffic per unit time is an equally important but less studied problem. We study a large number of traffic traces from many different networks including academic, commercial and residential networks using state-of-the-art statistical techniques. We show that traffic obeys the log-normal distribution which is a better fit than the Gaussian distribution commonly claimed in the literature. We also investigate an alternative heavy-tailed distribution (the Weibull) and show that its performance is better than Gaussian but worse than log-normal. We examine anomalous traces which exhibit a poor fit for all distributions tried and show that this is often due to traffic outages or links that hit maximum capacity. We demonstrate that the data we look at is stationary if we consider samples of 15- minute long or even 1-hour long. This gives confidence that we can use the distributions for estimation and modelling purposes. We demonstrate the utility of our findings in two contexts: predicting that the proportion of time traffic will exceed a given level (for service level agreement or link capacity estimation) and predicting 95th percentile pricing. We also show that the log-normal distribution is a better predictor than Gaussian or Weibull distributions in both contexts.

翻译：在网络链接上获取良好的交通统计模型是一个众所周知的、经常研究的问题。大量关注的是相关模式和流量期限。每单位时间流量的分布是一个同样重要但研究较少的问题。我们研究了许多不同网络的大量交通跟踪,包括学术、商业和住宅网络,使用最先进的统计技术。我们显示,交通符合逻辑正常分布,这比文献中通常声称的Gaussian分布更适合。我们还调查了另一种超速分布(Weibbull),并表明其性能比Gaussian更好,但比日志正常还要差。我们检查了显示所有分布不适应的异常痕迹,显示这往往是交通中断或连接影响最大容量的原因。我们证明,如果我们考虑15分钟长甚至1小时的样本,我们所查看的数据是静止的。这让我们相信,我们可以用分布来估计和模拟目的。我们展示了我们发现的结果在两种背景下的效用:预测流量比例是比正常水平的,我们预测是比正常水平都高。我们预测了服务比例的比标准水平的预测。