朝向管理高氯三联苯系统的能源消耗,并实行有过失容忍 (Towards Management of Energy Consumption in HPC Systems with Fault Tolerance)

from arxiv, This is the author version of the manuscript that was accepted for publication in 2020 IEEE Biennial Congress of Argentina (ARGENCON) (ISBN 978-1-7281-5957-7/20)

High-performance computing continues to increase its computing power and energy efficiency. However, energy consumption continues to rise and finding ways to limit and/or decrease it is a crucial point in current research. For high-performance MPI applications, there are rollback recovery based fault tolerance methods, such as uncoordinated checkpoints. These methods allow only some processes to go back in the face of failure, while the rest of the processes continue to run. In this article, we focus on the processes that continue execution, and propose a series of strategies to manage energy consumption when a failure occurs and uncoordinated checkpoints are used. We present an energy model to evaluate strategies and through simulation we analyze the behavior of an application under different configurations and failure time. As a result, we show the feasibility of improving energy efficiency in HPC systems in the presence of a failure.

翻译：高性能计算继续增加其计算力和能源效率,然而,能源消耗继续上升,寻找限制和(或)减少能源消耗的方法,是当前研究的一个关键点。对于高性能的MPI应用,有基于反向回收的缺陷容忍方法,如不协调的检查站。这些方法只允许一些程序在面临失败时返回,而其余程序则继续运行。在本篇文章中,我们侧重于继续实施的进程,并提出一系列战略,以便在出现故障和使用不协调的检查站时管理能源消耗。我们提出了一个能源模型,用以评价战略,并通过模拟分析不同配置和故障时间的应用行为。结果,我们展示了在出现故障时提高高电电能控制系统能效的可行性。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【2020新书】数据科学与机器学习导论，220页pdf

专知会员服务

81+阅读 · 2020年9月14日

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

专知会员服务

11+阅读 · 2020年4月27日