To keep up with demand, servers will scale up to handle hundreds of thousands of clients simultaneously. Much of the focus of the community has been on scaling servers in terms of aggregate traffic intensity (packets transmitted per second). However, bottlenecks caused by the increasing number of concurrent clients, resulting in a large number of concurrent flows, have received little attention. In this work, we focus on identifying such bottlenecks. In particular, we define two broad categories of problems; namely, admitting more packets into the network stack than can be handled efficiently, and increasing per-packet overhead within the stack. We show that these problems contribute to high CPU usage and network performance degradation in terms of aggregate throughput and RTT. Our measurement and analysis are performed in the context of the Linux networking stack, the the most widely used publicly available networking stack. Further, we discuss the relevance of our findings to other network stacks. The goal of our work is to highlight considerations required in the design of future networking stacks to enable efficient handling of large numbers of clients and flows.
翻译:为了跟上需求,服务器将扩大规模,以便同时处理数十万客户。社区的大部分重点一直是从总流量密集度方面扩大服务器(每秒发送一袋)。然而,由于同时客户数量不断增加,导致大量同时流动,造成瓶颈问题很少引起注意。在这项工作中,我们侧重于查明这些瓶颈问题。我们特别界定了两大类问题:在网络堆中接纳的包比可以有效处理的多,以及增加堆叠中每个包的间接费用。我们发现,这些问题导致CPU使用率高,在总流量和RTT方面造成网络性能退化。我们衡量和分析是在Linux网络堆中进行,这是最广泛使用的公开网络堆。此外,我们讨论了我们的调查结果与其他网络堆的关联性。我们工作的目的是突出设计未来网络堆中需要考虑的因素,以便能够有效地处理大量客户和流动。