Wide-area cloud provider networks must support the bandwidth requirements of diverse services (e.g., applications, product groups, customers) despite failures. Existing traffic engineering (TE) schemes operate at much coarser granularity than services, which we show necessitates unduly conservative decisions. To tackle this, we present FloMore, which directly considers the bandwidth needs of individual services and ensures they are met a desired percentage of time. Rather than meet the requirements for all services over the same set of failure states, FloMore exploits a key opportunity that each service could meet its bandwidth requirements over a different set of failure states. FloMore consists of an offline phase that identifies the critical failure states of each service, and on failure allocates traffic in a manner that prioritizes those services for which that failure state is critical. We present a novel decomposition scheme to handle FloMore's offline phase in a tractable manner. Our evaluations show that FloMore outperforms state-of-the-art TE schemes including SMORE and Teavar, and also out-performs extensions of these schemes that we devise. The results also show FloMore's decomposition approach allows it to scale well to larger network topologies.
翻译:广域云提供商网络必须支持多种服务(如应用程序、产品组、客户)的带宽要求,尽管失败了。现有的交通工程(TE)计划运行的离线阶段比服务要快得多,我们显示,要解决这个问题,我们提出FloMore,它直接考虑个别服务的带宽需要,并确保它们满足一定比例的时间要求。FloMore没有满足在同一组故障状态下的所有服务的要求,而是利用了一个关键的机会,每个服务都能够在不同的故障状态下满足其带宽要求。FloMore是一个脱线阶段,它确定每项服务的关键故障状态,而没有以优先安排那些失败状态至关重要的服务的方式分配交通。我们提出了一个新的分解计划,以可感动的方式处理Flomoore的离线阶段。我们的评估显示,FloMore超越了包括SMOE和Tevar在内的最新科技计划,并且也超越了这些计划的范围扩展。结果还显示,FloMoreel的顶级定位方法可以让其规模更大。