We consider the question: what is the abstraction that should be implemented by the computational engine of a machine learning system? Current machine learning systems typically push whole tensors through a series of compute kernels such as matrix multiplications or activation functions, where each kernel runs on an AI accelerator (ASIC) such as a GPU. This implementation abstraction provides little built-in support for ML systems to scale past a single machine, or for handling large models with matrices or tensors that do not easily fit into the RAM of an ASIC. In this paper, we present an alternative implementation abstraction called the tensor relational algebra (TRA). The TRA is a set-based algebra based on the relational algebra. Expressions in the TRA operate over binary tensor relations, where keys are multi-dimensional arrays and values are tensors. The TRA is easily executed with high efficiency in a parallel or distributed environment, and amenable to automatic optimization. Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML workflows in distributed clusters.
翻译:我们考虑的问题是:机器学习系统的计算引擎应该执行什么抽象?当前的机器学习系统通常会通过一系列计算内核,如矩阵倍增或激活功能,将整个变压器推动成一个整体内核,因为每个内核都使用AI加速器(ASIC)运行,如GPU。这种执行抽象化为ML系统提供了很少的内在支持,使其可以缩放过一台机器,或者用不易纳入ASIC内存的矩阵或电压处理大型模型。在本文中,我们提出了一个称为 Exor 关系代数(TRA) 的替代执行抽象。TRA是建立在关系代数(TRA) 基础上的基于定置代数的定置代数。TRA的表达方式在二进制数关系上运行,键是多维阵列和值是电压器。TRA很容易在平行或分布的环境中以高效率执行,并且可以自动优化。我们的经验研究表明,基于优化的TRA后端可大大超出分布的组合中运行 ML工作流程的立式替代品。