Scaling Data Center TCP to Terabits with Laminar

Laminar is the first TCP stack designed for the reconfigurable match-action table (RMT) architecture, widely used in high-speed programmable switches and SmartNICs. Laminar reimagines TCP processing as a pipeline of simple match-action operations, enabling line-rate performance with low latency and minimal energy consumption, while maintaining compatibility with standard TCP and POSIX sockets. Leveraging novel techniques like optimistic concurrency, pseudo segment updates, and bump-in-the-wire processing, Laminar handles the transport logic, including retransmission, reassembly, flow, and congestion control, entirely within the RMT pipeline. We prototype Laminar on an Intel Tofino2 switch, and demonstrate its scalability to terabit speeds, its flexibility, and robustness to network dynamics. Laminar delivers RDMA-equivalent performance, saving up to 16 host CPU cores versus the TAS kernel-bypass TCP stack with short RPC workloads, achieving 1.3$\times$ higher peak throughput at 5$\times$ lower 99.99p tail latency. At scale, Laminar drives nearly $1$Bpps of TCP processing while keeping RPC tail latency near $20\mu s$. For streaming workloads, Laminar achieves $25$Mpps per-core, enough to saturate the line-rate. It significantly benefits real applications: a key-value store on Laminar doubles throughput-per-watt while maintaining a 99.99p tail latency lower than TAS's best case tail latency, and SPDK's NVMe-oTCP reaches RDMA-level efficiency. Demonstrating Laminar's flexibility, we implement TCP stack extensions, including a sequencer API for a linearizable distributed shared log, Timely congestion control, and delayed ACKs. Finally, Laminar generalizes to FPGA SmartNICs, delivering $3\times$ ToNIC's packet rate under equal timing.

翻译：暂无翻译