Distributed protocols are widely used to support network functions such as clock synchronization and multicast. As the network gets larger and faster, it is increasingly challenging for these protocols to react quickly to network events. The theory community has made significant progress in developing distributed message passing algorithms with improved convergence times. With the emerging programmability at switches, it now becomes feasible to adopt and adapt these theoretical advances for networking functions. In this paper, we propose FRANCIS, a new framework for running message passing algorithms on programmable switches to enable fast reactions to network events in large networks. We introduce an execution engine with computing and communication primitives for supporting message passing algorithms in P4 switches. We exemplify the framework's usefulness by improving the resiliency and reaction times of clock synchronization and source-routed multicast. In particular, our approach allows lower clock drift than Sundial and PTP, quickly recovers from multiple failures, and reduces the time uncertainty bound by up to 5x. Compared with state-of-the-art multicast solutions, our approach uses packet headers up to 33\% smaller and has an order of magnitude faster reaction time.
翻译:分布式协议被广泛用于支持诸如时钟同步和多播等网络功能。 随着网络越来越大和更快, 这些协议对网络事件的反应越来越具有挑战性。 理论界在开发分布式信息传递算法方面取得了显著的进展, 随着同步时间的改善。 由于开关正在出现程序化, 现在为网络功能采用和调整这些理论进步变得可行。 在本文中, 我们提议 FRANCIS, 用于运行可编程开关上的信息传递算法的新框架, 以便能够对大型网络的网络事件做出快速反应 。 我们推出一个使用计算和通信原始功能支持 P4 开关中的信息传递算法的执行引擎。 我们通过改进时钟同步和源路由多播的回算法的弹性和反应时间, 来展示框架的效用。 特别是, 我们的方法允许比 Sundial 和 PTP 更快的时间流慢, 快速从多重故障中恢复, 并将时间的不确定性降低到 5x 。 与最先进的多播送解决方案相比, 我们的方法使用包头小到 3+3⁄3⁄3⁄4 并且有更快速反应时间的顺序 。