When physical testbeds are out of reach for evaluating a networked system, we frequently turn to simulation. In today's datacenter networks, bottlenecks are rarely at the network protocol level, but instead in end-host software or hardware components, thus current protocol-level simulations are inadequate means of evaluation. End-to-end simulations covering these components on the other hand, simply cannot achieve the required scale with feasible simulation performance and computational resources. In this paper, we address this with SplitSim, a simulation framework for end-to-end evaluation for large-scale network and distributed systems. To this end, SplitSim builds on prior work on modular end-to-end simulations and combines this with key elements to achieve scalability. First, mixed fidelity simulations judiciously reduce detail in simulation of parts of the system where this can be tolerated, while retaining the necessary detail elsewhere. SplitSim then parallelizes bottleneck simulators by decomposing them into multiple parallel but synchronized processes. Next, SplitSim provides a profiler to help users understand simulation performance and where the bottlenecks are, so users can adjust the configuration. Finally SplitSim provides abstractions to make it easy for users to build complex large-scale simulations. Our evaluation demonstrates SplitSim in multiple large-scale case studies.
翻译:暂无翻译