Applications running in geographically distributed setting are becoming prevalent. Large-scale online services often share or replicate their data into multiple data centers (DCs) in different geographic regions. Driven by the data communication need of these applications, inter-datacenter network (IDN) is getting increasingly important. However, we find congestion control for inter-datacenter networks quite challenging. Firstly, the inter-datacenter communication involves both data center networks (DCNs) and wide-area networks (WANs) connecting multiple data centers. Such a network environment presents quite heterogeneous characteristics (e.g., buffer depths, RTTs). Existing congestion control mechanisms consider either DCN or WAN congestion, while not simultaneously capturing the degree of congestion for both. Secondly, to reduce evolution cost and improve flexibility, large enterprises have been building and deploying their wide-area routers based on shallow-buffered switching chips. However, with legacy congestion control mechanisms (e.g., TCP Cubic), shallow buffer can easily get overwhelmed by large BDP (bandwidth-delay product) wide-area traffic, leading to high packet losses and degraded throughput. This thesis describes my research efforts on optimizing congestion control mechanisms for the inter-datacenter networks. First, we design GEMINI - a practical congestion control mechanism that simultaneously handles congestions both in DCN andWAN. Second, we present FlashPass - a proactive congestion control mechanism that achieves near zero loss without degrading throughput under the shallow-buffered WAN. Extensive evaluation shows their superior performance over existing congestion control mechanisms.
翻译:在地理分布环境中运行的应用正在变得十分普遍。大型在线服务往往将数据共享或复制到不同地理区域的多个数据中心(DCs)。由于这些应用程序的数据通信需要,数据中心网络(IDN)越来越重要。然而,我们认为数据中心网络的拥堵控制相当艰巨。首先,数据中心通信涉及数据中心网络(DCNs)和连接多个数据中心的广域网(WANs),网络环境呈现出相当不一的特点(例如缓冲深度、RTTs)。现有的拥堵控制机制考虑的是DCN或广域网的拥堵,而没有同时抓住这两个应用程序的拥堵程度。第二,为了降低进化成本和提高灵活性,大型企业一直在建设和部署基于浅缓冲开关的广域路标。然而,由于遗留的拥堵控制机制(例如TCP Cubic),浅缓冲机制很容易被大型BDP(宽度-delay产品)的低压过,导致高基网或广域网的拥堵塞,导致高额损失和近端网塞。这一设计机制的内压工作显示了我们目前快速控制机制。