Modern I/O applications that run on HPC infrastructures are increasingly becoming read and metadata intensive. However, having multiple concurrent applications submitting large amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to overall performance degradation and I/O unfairness. We present PADLL, an application and file system agnostic storage middleware that enables QoS control of data and metadata workflows in HPC storage systems. It adopts ideas from Software-Defined Storage, building data plane stages that mediate and rate limit POSIX requests submitted to the shared file system, and a control plane that holistically coordinates how all I/O workflows are handled. We demonstrate its performance and feasibility under multiple QoS policies using synthetic benchmarks, real-world applications, and traces collected from a production file system. Results show that PADLL can enforce complex storage QoS policies over concurrent metadata-aggressive jobs, ensuring fairness and prioritization.
翻译:摘要:在 HPC 基础设施上运行的现代 I/O 应用程序越来越具备读取和元数据密集的特性。然而,有多个并发应用程序提交大量元数据操作,可能会很容易饱和共享并行文件系统的元数据资源,从而导致总体性能降低和 I/O 不公平。我们提出了 PADLL,这是一个应用程序和文件系统无关的存储中间件,它能够在 HPC 存储系统中实现数据和元数据工作流的 QoS 控制。它采用了软件定义存储的思想,构建了数据平面阶段,介导和限制提交到共享文件系统的 POSIX 请求,和一个控制平面,全面协调所有 I/O 工作流的处理。我们使用合成基准、真实世界应用程序和从生产文件系统收集的跟踪展示了其在多个 QoS 策略下的性能和可行性。结果显示,PADLL 可以在并发元数据密集型作业中执行复杂的存储 QoS 策略,确保公平性和优先级。