The majority of Internet traffic is caused by a relatively small number of flows (so-called elephant flows). This phenomenon can be exploited to facilitate traffic engineering: resource-costly individual flow forwarding entries can be created only for elephants while serving mice over the shortest paths. Although this idea already appeared in proposed TE systems, it was not examined by itself. It remains unknown what extent of flow table occupancy and operations number reduction can be achieved or how to select thresholds or sampling rates to cover the desired fraction of traffic. In this paper, we use reproducible traffic models obtained from a 30-day-long campus trace covering 4 billion flows, to answer these questions. We establish theoretical boundaries for flow table usage reduction algorithms that classify flows since the first packet, after reaching a predefined counter threshold or detect elephants by sampling. An important finding is that simple packet sampling performs surprisingly well on realistic traffic, reducing the number of flow entries by a factor up to 400, still covering 80% of the traffic. We also provide an open-source software package allowing the replication of our experiments or the performing of similar evaluations for other algorithms or flow distributions.
翻译:大部分互联网流量是由数量相对较少的流量(所谓的大象流量)造成的。这一现象可以用来促进交通工程:在最短的路径上服务小鼠时,只能为大象创建资源成本较高的单个流量转发条目。虽然这个想法已经在拟议的TE系统中出现,但其本身并没有加以研究。仍然不清楚流表占用量和操作号的减少程度,或者如何选择阈值或取样率以覆盖所希望的流量部分。在本文中,我们使用30天的校园跟踪(覆盖40亿流量)获得的可复制流量模型来回答这些问题。我们为流表使用量减少算法制定了理论界限,在达到预先确定的反临界值后,或者通过取样检测对流进行分类。一个重要的发现是,简单的组合取样在现实交通上表现得惊人地好,将流量输入量减少到400倍,但仍覆盖80%的流量。我们还提供了一个开放源软件包,允许复制我们的实验,或者对其他算法或流量分布进行类似的评估。