Stream monitoring is fundamental in many data stream applications, such as financial data trackers, security, anomaly detection, and load balancing. In that respect, quantiles are of particular interest, as they often capture the user's utility. For example, if a video connection has high tail latency, the perceived quality will suffer, even if the average and median latencies are low. In this work, we consider the problem of approximating the per-item quantiles. Elements in our stream are (ID, latency) tuples, and we wish to track the latency quantiles for each ID. Existing quantile sketches are designed for a single number stream (e.g., containing just the latency). While one could allocate a separate sketch instance for each ID, this may require an infeasible amount of memory. Instead, we consider tracking the quantiles for the heavy hitters (most frequent items), which are often considered particularly important, without knowing them beforehand. We first present a simple sampling algorithm that serves as a benchmark. Then, we design an algorithm that augments a quantile sketch within each entry of a heavy hitter algorithm, resulting in similar space complexity but with a deterministic error guarantee. Finally, we present SQUAD, a method that combines sampling and sketching while improving the asymptotic space complexity. Intuitively, SQUAD uses a background sampling process to capture the behaviour of the latencies of an item before it is allocated with a sketch, thereby allowing us to use fewer samples and sketches. Our solutions are rigorously analyzed, and we demonstrate the superiority of our approach using extensive simulations.
翻译:串流监测在许多数据流应用中至关重要, 如金融数据追踪器、安全性、异常检测和负负平衡。 在这方面, 量数特别有意义, 因为它们往往能捕捉用户的效用。 例如, 如果视频连接尾部延缓度高, 感知质量也会受到影响, 即使平均和中位延迟度低。 在这项工作中, 我们考虑对每个项目数量进行接近的问题。 我们的流中元素是( ID 、 latency) 平流图, 我们希望跟踪每个 ID 的延缓量 。 现有的量数数图是专门为单个数字流设计的( 例如, 仅包含延缓度 ) 。 如果一个视频连接点可以为每ID 分配一个单独的草样实例, 这可能要求一个不切实际的记忆量。 相反, 我们考虑跟踪重击点( 最经常出现的项目) 的孔数, 通常被认为特别重要, 但不事先知道它们。 我们首先提出一个简单的取样算法作为基准。 然后, 我们设计一个简单的缩算法, 我们用一个更精确的缩的缩缩略图, 最终的算法, 将我们用一个重的缩缩算方法, 使我们用一个重的缩缩算法 。