To ensure high availability, datacenter networks must rely on local fast rerouting mechanisms that allow routers to quickly react to link failures, in a fully decentralized manner. However, configuring these mechanisms to provide a high resilience against multiple failures while avoiding congestion along failover routes is algorithmically challenging, as the rerouting rules can only depend on local failure information and must be defined ahead of time. This paper presents a randomized local fast rerouting algorithm for Clos networks, the predominant datacenter topologies. Given a graph $G=(V,E)$ describing a Clos topology, our algorithm defines local routing rules for each node $v\in V$, which only depend on the packet's destination and are conditioned on the incident link failures. We prove that as long as number of failures at each node does not exceed a certain bound, our algorithm achieves an asymptotically minimal congestion up to polyloglog factors along failover paths. Our lower bounds are developed under some natural routing assumptions.
翻译:为确保高可用性,数据中心网络必须依靠本地快速改道机制,使路由器能够以完全分散的方式快速应对连接失败。然而,将这些机制配置为针对多重故障提供高抗御能力,同时避免交通不便沿线的拥堵,这在逻辑上具有挑战性,因为改道规则只能依赖本地故障信息,而且必须提前界定。本文为主要数据中心表层的克洛(Cloes)网络提供了随机的本地快速改道算法。根据描述布局表的图表$G=(V,E)$,我们的算法定义了每个节点$v\in V$的本地路由规则,每个节点仅取决于包的目的地,并以事故链接故障为条件。我们证明,只要每个节点的故障数量不超过一定的界限,我们的算法就会在不连续路径路径上实现微小的混杂症成。我们较低的界限是在一些自然路由假设下形成的。