In this paper, we address the problem of detecting the moment when an ongoing asynchronous parallel iterative process can be terminated to provide a sufficiently precise solution to a fixed-point problem being solved. Formulating the detection problem as a global solution identification problem, we analyze the snapshot-based approach, which is the only one that allows for exact global residual error computation. From a recently developed approximate snapshot protocol providing a reliable global residual error, we experimentally investigate here, as well, the reliability of a global residual error computed without any prior particular detection mechanism. Results on a single-site supercomputer successfully show that such high-performance computing platforms possibly provide computational environments stable enough to allow for simply resorting to non-blocking reduction operations for computing reliable global residual errors, which provides noticeable time saving, at both implementation and execution levels.
翻译:在本文中,我们探讨了如何发现一个时刻,即一个不间断的平行平行迭代进程何时可以终止,以便为正在解决的固定点问题提供足够精确的解决方案。将探测问题作为一个全球解决方案识别问题来配置,我们分析只允许精确计算全球剩余误差的快照方法。根据最近开发的近似快照协议提供了可靠的全球剩余误差,我们在这里实验性地调查了在没有事先任何特定检测机制的情况下计算的全球剩余误差的可靠性。 单站超级计算机的结果表明,这种高性能计算平台可能提供足够稳定的计算环境,从而可以简单地使用不阻拦的削减操作来计算可靠的全球残余误差,这在实施和执行层面都提供了显著的时间节约。