We propose TOD, a system for efficient and scalable outlier detection (OD) on distributed multi-GPU machines. A key idea behind TOD is decomposing OD applications into basic tensor algebra operations. This decomposition enables TOD to accelerate OD computations by leveraging recent advances in deep learning infrastructure in both hardware and software. Moreover, to deploy costly OD algorithms on modern GPUs with limited memory, we introduce two key techniques. First, provable quantization speeds up OD computation and reduces its memory footprint by performing specific floating-point operations in lower precision while provably guaranteeing no accuracy loss. Second, to exploit the aggregated compute resources and memory capacity of multiple GPUs, we introduce automatic batching, which decomposes OD computations into small batches for parallel execution on multiple GPUs. TOD supports a comprehensive and diverse set of OD algorithms, e.g., LOF, PCA, and HBOS, and utility functions. Extensive evaluation on both real and synthetic OD datasets shows that TOD is on average 11.6x faster than the leading CPU-based OD system PyOD (with a maximum speedup of 38.9x), and can handle much larger datasets than various GPU baselines. Notably, TOD allows straightforward integration of additional OD algorithms and provides a unified framework for combining classical OD algorithms with deep learning methods. These combinations result in an infinite number of OD methods, many of which are novel and can be easily prototyped in TOD.
翻译:我们提出TOD,这是一个在分布式多GPU机器上高效和可扩缩的外差探测系统。TOD背后的一个关键想法是将OD应用程序分解成基本高压代数操作。这种分解使TOD能够利用硬件和软件中深层学习基础设施的最新进展加速OD计算。此外,在现代GPU上采用一套全面的和多样化的内存有限的OD算法,我们引入了两种关键技术。首先,可辨认的定量计算加快了OD计算速度,并通过以较低精度执行特定的浮点操作来减少其记忆足迹。同时可以确保不发生准确性损失。第二,利用多个GPU的集成资源与记忆能力,我们引入自动分解将OD计算分成小批数,以便在多个GPPU上平行执行。 TOD支持一套全面和多样化的调控算方法,例如LOF、CPA和HBOS,以及实用功能。对真实和合成的OD数据集集进行广泛的评价表明,TOD是平均的11.6x速度,这比高级的CODLOD 和最高级的LODLADLAD的精定式计算速度更快。