Outlier detection (OD) is a key learning task for finding rare and deviant data samples, with many time-critical applications such as fraud detection and intrusion detection. In this work, we propose TOD, the first tensor-based system for efficient and scalable outlier detection on distributed multi-GPU machines. A key idea behind TOD is decomposing complex OD applications into a small collection of basic tensor algebra operators. This decomposition enables TOD to accelerate OD computations by leveraging recent advances in deep learning infrastructure in both hardware and software. Moreover, to deploy memory-intensive OD applications on modern GPUs with limited on-device memory, we introduce two key techniques. First, provable quantization speeds up OD computations and reduces its memory footprint by automatically performing specific floating-point operations in lower precision while provably guaranteeing no accuracy loss. Second, to exploit the aggregated compute resources and memory capacity of multiple GPUs, we introduce automatic batching, which decomposes OD computations into small batches for both sequential execution on a single GPU and parallel execution on multiple GPUs. TOD supports a diverse set of OD algorithms. Extensive evaluation on 11 real and 3 synthetic OD datasets shows that TOD is on average 10.9x faster than the leading CPU-based OD system PyOD (with a maximum speedup of 38.9x), and can handle much larger datasets than existing GPU-based OD systems. In addition, TOD allows easy integration of new OD operators, enabling fast prototyping of emerging and yet-to-discovered OD algorithms.
翻译:在这项工作中,我们提议在现代GPU上安装记忆密集的OD应用程序,其使用范围较窄的存储存储存储存储存储存储存储存储器,我们采用两种关键技术。首先,在分布式多GPU机器上,以高效和可扩缩的方式检测以高效和可扩缩为主的系统TOD。在TOD背后的一个关键想法是将复杂的OD应用程序分解成一个小型的基本高压代数操作器。这一分解使TOD能够利用硬件和软件中深层学习基础设施的最新进展,加快对OD的计算。此外,在现代GPU和内存存储存储存储存储器中,将存储密集的OD应用程序安装在规模较有限的现代GPU存储存储器上,我们引入了两种关键技术。首先,可可可可变的定量计算系统加快了对OD的计算速度,通过自动进行特定的浮动点操作,同时保证不发生精确损失。第二,利用基于多个GPUPU的汇总资源和记忆能力,我们引入自动分解,这样可以使OD的计算能够容易进行新的分批量,但在单个GPPOD系统上进行连续执行。