Over the lifetime of a computing task, determining the maximum usage of random-access memory (RAM) on both the motherboard and on a graphical processing unit (GPU), as well as the utilization percentage of the central processing unit (CPU) and GPU, can be extremely useful for troubleshooting points of failure as well as optimizing memory and processing unit utilization, especially within a high-performance computing (HPC) setting. While there are tools for tracking compute time, CPU utilization, and RAM, including by job management tools themselves, tracking of GPU usage, to our knowledge, does not currently have sufficient solutions, particularly in Unix/Linux operating systems. We present gpu-tracker, a multi-operating system Python package that tracks the computational resource usage of a task while running in the background, including the real compute time that the task takes to complete, its maximum RAM usage, the average and maximum percentage of CPU utilization, the maximum GPU RAM usage, and the average and maximum percentage of GPU utilization for both Nvidia and AMD GPUs. We demonstrate that gpu-tracker can seamlessly track computational resource usage with minimal overhead, both within desktop and HPC execution environments.
翻译:暂无翻译