Sampling is often a necessary evil to reduce the processing and storage costs of distributed tracing. In this work, we describe a scalable and adaptive sampling approach that can preserve events of interest better than the widely used head-based sampling approach. Sampling rates can be chosen individually and independently for every span, allowing to take span attributes and local resource constraints into account. The resulting traces are often only partially and not completely sampled which complicates statistical analysis. To exploit the given information, an unbiased estimation algorithm is presented. Even though it does not need to know whether the traces are complete, it reduces the estimation error in many cases compared to considering only complete traces.
翻译:取样往往是减少分布式追踪的处理和储存费用的必要弊端。在这项工作中,我们描述了一种可扩缩和适应性的抽样方法,它比广泛使用的头基抽样方法更能保存感兴趣的事件。抽样率可以单独和独立地为每个范围选择,以便考虑到各种属性和当地资源限制。由此产生的痕迹往往只是部分抽样,而不是完全抽样,使统计分析复杂化。为了利用所提供的信息,提出了一种公正的估算算法。尽管不需要知道痕迹是否完整,但在许多情况下,它减少了估计错误,而不是只考虑完整的痕迹。