Diverse workloads such as interactive supercomputing, big data analysis, and large-scale AI algorithm development, requires a high-performance scheduler. This paper presents a novel node-based scheduling approach for large scale simulations of short running jobs on MIT SuperCloud systems, that allows the resources to be fully utilized for both long running batch jobs while simultaneously providing fast launch and release of large-scale short running jobs. The node-based scheduling approach has demonstrated up to 100 times faster scheduler performance that other state-of-the-art systems.
翻译:交互式超级计算、大数据分析和大规模人工智能算法开发等多种工作量都需要高性能调度器。 本文为大规模模拟麻省理工学院超级楼层系统短跑职位提供了新型节点计时法,使资源能够充分用于长期运行的批发工作,同时提供快速启动和释放大型短跑工作。 节点计时法显示了其他最先进的系统速度快100倍的速率。