This paper presents a solution to the challenge of mitigating carbon emissions from large-scale high performance computing (HPC) systems and datacenters that host machine learning (ML) inference services. ML inference is critical to modern technology products, but it is also a significant contributor to datacenter compute cycles and carbon emissions. We introduce Clover, a carbon-friendly ML inference service runtime system that balances performance, accuracy, and carbon emissions through mixed-quality models and GPU resource partitioning. Our experimental results demonstrate that Clover is effective in substantially reducing carbon emissions while maintaining high accuracy and meeting service level agreement (SLA) targets. Therefore, it is a promising solution toward achieving carbon neutrality in HPC systems and datacenters.
翻译:本文提出了一种解决大规模高性能计算(HPC)系统和数据中心承载机器学习(ML)推断服务中缓解碳排放的挑战的方案。 ML推断对现代技术产品至关重要,但它也是数据中心计算周期和碳排放的重要贡献者。 我们引入Clover,一种碳友好的ML推断服务运行时系统,通过混合质量模型和GPU资源划分平衡性能,准确性和碳排放。我们的实验结果证明,Clover在保持高准确性和满足服务级别协议(SLA)目标的同时,有效地减少了碳排放。因此,它是实现HPC系统和数据中心的碳中和的一个有前途的解决方案。