Traditional network monitoring solutions usually lack of scalability due to their centralized nature collecting heartbeats from all network components via a single controller. As a solution, In-Band Network Telemetry (INT) framework has been recently proposed to collect network telemetry information more autonomously and distributedly by employing programmable switches. However, it imposes further challenges to (i) find suitable INT paths to optimize the control overhead and information freshness and (ii) ensure reliable delivery of control information over multi-hop INT paths. In this work, we propose a monitoring scheme, reliable Graph Partitioned INT (GPINT), by extending our previous work and integrating shared queue ring (SQR) as a reliability feature against potential failures in network telemetry collection due to network congestion and link degradation that may cause loss of the visibility of the network. We implement our proposal in a recent data plane programming language P4, and compare it with traditional Simple Network Management Protocol (SNMP) and also another state-of-the-art study employing Euler's method for INT path generation. Our analysis first shows the importance of having a data recovery mechanism against packet losses under different network conditions. Then, our emulation results indicate that GPINT with reliability extension performs much better than its opponent in terms of telemetry collection latency and overhead monitoring scheme even under a high amount of packet losses.
翻译:传统网络监测办法通常缺乏可伸缩性,因为其集中性质通过单一控制器从所有网络部件收集心跳。作为一种解决办法,最近提议了 " 禁网遥测(INT) " 框架,以更自主的方式收集网络遥测信息,并通过使用可编程序开关进行分配;然而,它提出了进一步的挑战,即(一) 找到适当的INT路径,优化控制间接费用和信息更新,(二) 确保可靠地提供多波式INT路径的控制信息。在这项工作中,我们提出一个可靠的图解分割INT(GPINT)监测办法,办法是扩大我们以前的工作,并整合共享排队环(SQR),作为可靠的特征,防止由于网络拥挤和退化而可能造成网络可见度降低的网络遥测(SQR)的潜在故障。我们用最近的数据平面程序语言P4执行我们的建议,并将它与传统的简单网络管理协议(SNMP)和另一个采用Euler方法生成INT路径的状态研究加以比较。我们的分析首先表明,在不同的网络条件下,必须建立数据回收机制,防止在高频测数据库中丢失损失,然后以更高程度的系统收集结果。