Output-intensive scientific applications are highly sensitive to low storage throughput. While existing scientific application stacks are optimized for traditional High-Performance Computing (HPC) environments with high remote storage and network bandwidth, these assumptions often fail in modern settings like cloud deployment. This is because the existing scientific application I/O stack fails to leverage the available resources. At the same time, scientific applications exhibit special synchronization and data output requirements that are difficult to satisfy using traditional approaches such as block-level or filesystem-level caching. We introduce ParaLog, a distributed host-side logging approach designed to accelerate scientific applications transparently. ParaLog emphasizes deployability, enabling support for unmodified message passing interface (MPI) applications and implementations while preserving crash consistency semantics. We evaluate ParaLog across traditional HPC, cloud HPC, local clusters, and hybrid environments, demonstrating its capability to reduce end-to-end execution time by 13-26% for popular scientific applications in cloud settings.
翻译:暂无翻译