The emergence of novel hardware accelerators has powered the tremendous growth of machine learning in recent years. These accelerators deliver incomparable performance gains in processing high-volume matrix operators, particularly matrix multiplication, a core component of neural network training and inference. In this work, we explored opportunities of accelerating database systems using NVIDIA's Tensor Core Units (TCUs). We present TCUDB, a TCU-accelerated query engine processing a set of query operators including natural joins and group-by aggregates as matrix operators within TCUs. Matrix multiplication was considered inefficient in the past; however, this strategy has remained largely unexplored in conventional GPU-based databases, which primarily rely on vector or scalar processing. We demonstrate the significant performance gain of TCUDB in a range of real-world applications including entity matching, graph query processing, and matrix-based data analytics. TCUDB achieves up to 288x speedup compared to a baseline GPU-based query engine.
翻译:近年来,新型硬件加速器的出现推动了机器学习的巨大增长。这些加速器在处理高容量矩阵操作员方面带来了无法比拟的绩效收益,特别是矩阵倍增,这是神经网络培训和推断的核心组成部分。在这项工作中,我们探索了利用NVIDIA的Tensor核心单位加速数据库系统的机会。我们介绍了由TCU加速的查询引擎TCUDB, 处理一组查询操作员,包括自然连接和作为TCU内矩阵操作员的组群组合。过去人们认为,矩阵的倍增效率不高;然而,这一战略在很大程度上仍然未在常规的基于GPU的数据库中进行开发,这些数据库主要依赖矢量或星际处理。我们展示了TCUDB在实体匹配、图形查询处理和基于矩阵的数据分析等一系列实际应用中的显著绩效收益。TCUDB与基于基准的GPU的查询引擎相比,达到288x速度。