Software packages like TensorFlow and PyTorch are designed to support linear algebra operations, and their speed and usability determine their success. However, by prioritising speed, they often neglect memory requirements. As a consequence, the implementations of memory-intensive algorithms that are convenient in terms of software design can often not be run for large problems due to memory overflows. Memory-efficient solutions require complex programming approaches with significant logic outside the computational framework. This impairs the adoption and use of such algorithms. To address this, we developed an XLA compiler extension that adjusts the computational data-flow representation of an algorithm according to a user-specified memory limit. We show that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device, where standard implementations would have failed. Our approach leads to better use of hardware resources. We believe that further focus on removing memory constraints at a compiler level will widen the range of machine learning methods that can be developed in the future.
翻译:TensorFlow 和 PyTorrch 等软件软件包的设计是为了支持线性代数操作,其速度和可用性决定其成功与否。 但是,通过优先排序速度,它们往往忽视记忆要求。 因此,在软件设计方面方便的内存密集算法的实施往往无法解决因记忆溢出而产生的大问题。 记忆高效解决方案要求在计算框架之外采用具有重要逻辑的复杂编程方法。 这妨碍这些算法的采用和使用。 为了解决这个问题,我们开发了一个 XLA 编译器扩展件,根据用户指定的内存限制调整算法的计算数据流代表。 我们显示, K- 最近邻和稀有的高斯进程回归方法可以在一个单一设备上以更大的规模运行,而标准执行可能失败。 我们的方法导致更好地使用硬件资源。 我们相信,进一步注重消除编译器层面的内存限制将会扩大未来可以开发的机器学习方法的范围。