Coupling regular topologies with optimized routing algorithms is key in pushing the performance of interconnection networks of HPC systems. In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalized Fat-Trees (PGFTs) which minimizes congestion risk even under massive topology degradation caused by equipment failure. It applies a modulo-based computation of forwarding tables among switches closer to the destination, using only knowledge of subtrees for pre-modulo division. Dmodc allows complete rerouting of topologies with tens of thousands of nodes in less than a second, which greatly helps centralized fabric management react to faults with high-quality routing tables and no impact to running applications in current and future very large-scale HPC clusters. We compare Dmodc against routing algorithms available in the InfiniBand control software (OpenSM) first for routing execution time to show feasibility at scale, and then for congestion risk under degradation to demonstrate robustness. The latter comparison is done using static analysis of routing tables under random permutation (RP), shift permutation (SP) and all-to-all (A2A) traffic patterns. Results for Dmodc show A2A and RP congestion risks similar under heavy degradation as the most stable algorithms compared, and near-optimal SP congestion risk up to 1% of random degradation.
翻译:使用优化路由算法来定期调制常规地形,这是推动高氯联苯系统互联网络运行的关键。在本文件中,我们介绍了Dmodc。Dmodc为平行通用胖子(PGFTs)提供了一种快速的确定性路径算法,它即使在设备故障导致的大规模地形退化的情况下也能将拥堵风险最小化。我们用模式计算离目的地更近的开关之间的转发表,它只使用对亚树类的了解来显示规模化前的分解。Dmodc允许完全改变地形的路线,以不到一秒的数以万计的零点为单位。这大大有助于集中的组织结构管理对高质量路由表的错误作出反应,而且不会影响当前和未来的超大型HPC集群的应用。我们用模式将Dmodc与InfiniBand控制软件(OmSM)中现有的路由算法进行对比,首先使用执行时间来显示规模化的可行性,然后在退化中的拥堵风险以显示稳健性。后一种比较是使用随机透式的路径表的固定分析,在近地平流结构内进行,在1号(RPA) 的递变的递变的递化,并显示最稳定的递变的递变的递变的递变的递制式,以显示最重的RPA(SP) 和最重的压式的压式的结果。