The latest version of MPI introduces new functionalities like the Session model, but it still lacks fault management mechanisms. Past efforts produced tools and MPI standard extensions to manage fault presence, including ULFM. These measures are effective against faults but do not fully support the new additions to the standard. In this paper, we combine the fault management possibilities of ULFM with the new Session model functionality introduced in version 4.0 of the standard. We focus on the communicator creation procedure, highlighting criticalities and proposing a method to circumvent them. The experimental campaign shows that the proposed solution does not significantly affect applications' execution time and scalability while better managing the insurgence of faults.
翻译:最新版本的MPI 引入了新的功能, 如 Session 模式, 但它仍然缺乏缺陷管理机制 。 过去的努力产生了工具 和 MPI 标准扩展 来管理过失存在, 包括 ULFM 。 这些措施对过失有效, 但并不完全支持对标准的新补充 。 在本文中, 我们将 ULFM 的过失管理可能性与 标准版本4. 0 中引入的新的会议模式功能结合起来 。 我们侧重于 通信创建程序, 突出关键点, 并提出绕过它们的方法 。 实验运动显示, 拟议的解决方案不会在更好地管理错误突发的同时, 显著影响应用程序的执行时间和可扩展性 。</s>