In recent decades, High Performance Computing (HPC) has undergone significant enhancements, particularly in the realm of hardware platforms, aimed at delivering increased processing power while keeping power consumption within reasonable limits. The Intelligence Processing Unit (IPU) represents an entirely novel category of massively parallel processors, meticulously designed to expedite parallel computations through a multitude of processing cores and on-chip memory components interconnected via high-speed fabrics. While IPUs are primarily tailored for machine learning applications and come equipped with several libraries for the seamless implementation of neural networks, they also retain the capability to execute traditional parallel programs like matrix multiplication. However, it is essential to acknowledge that there are certain considerations and limitations when utilizing IPUs for such tasks. This paper embarks on an extensive analytical examination of matrix multiplications (MM) executed on an IPU, focusing on aspects such as execution efficiency and memory usage. Additionally, a comparative analysis is conducted, pitting the IPU against a GPU. Our findings indicate that IPUs can outperform modern GPUs, especially in handling the consistently challenging skewed matrix multiplication operations. For a more comprehensive understanding, we scrutinize various aspect ratios of matrices for these operations on an IPU and a Turing-class GPU (RTX 2080TI), revealing that the IPU consistently delivers more robust performance when dealing with skewed matrices compared to a GPU.
翻译:暂无翻译