With the increasing scale and complexity of cloud systems and big data analytics platforms, it is becoming more and more challenging to understand and diagnose the processing of a service request in such distributed platforms. One way that helps to deal with this problem is to capture the complete end-to-end execution path of service requests among all involved components accurately. This paper presents REPTrace, a generic methodology for capturing such execution paths in a transparent fashion. We analyze a comprehensive list of execution scenarios, and propose principles and algorithms for generating the end-to-end request execution path for all the scenarios. Moreover, this paper presents an anomaly detection approach exploiting request execution paths to detect anomalies of the execution during request processing. The experiments on four popular distributed platforms with different workloads show that REPTrace can transparently capture the accurate request execution path with reasonable latency and negligible network overhead. Fault injection experiments show that execution anomalies are detected with high recall (96%).
翻译:随着云层系统和大数据分析平台规模和复杂性的日益扩大和复杂,理解和判断在这种分布式平台中处理服务请求的情况越来越具有挑战性。有助于解决这一问题的一种方法就是准确掌握所有相关组成部分服务请求的完整端到端执行路径。本文介绍了一种以透明方式捕捉此类执行路径的通用方法REPTROce。我们分析了执行设想方案的综合清单,并提出了为所有情景生成端到端请求执行路径的原则和算法。此外,本文件还介绍了一种异常探测方法,利用请求执行路径探测请求处理过程中的异常现象。在四个流行分布式平台上进行的不同工作量实验表明,REPTROce可以透明地以合理的耐久和可忽略的网络顶部来捕捉准确的请求执行路径。 过失注射实验显示,执行异常现象是高清的(96% ) 。