Designing and implementing efficient parallel priority schedulers is an active research area. An intriguing proposed design is the Multi-Queue: given $n$ threads and $m\ge n$ distinct priority queues, task insertions are performed uniformly at random, while, to delete, a thread picks two queues uniformly at random, and removes the observed task of higher priority. This approach scales well, and has probabilistic rank guarantees: roughly, the rank of each task removed, relative to remaining tasks in all other queues, is $O(m)$ in expectation. Yet, the performance of this pattern is below that of well-engineered schedulers, which eschew theoretical guarantees for practical efficiency. We investigate whether it is possible to design and implement a Multi-Queue-based task scheduler that is both highly efficient and has analytical guarantees. We propose a new variant called the Stealing Multi-Queue (SMQ), a cache-efficient variant of the Multi-Queue, which leverages both queue affinity -- each thread has a local queue, from which tasks are usually removed; but, with some probability, threads also attempt to steal higher-priority tasks from the other queues -- and task batching, that is, the processing of several tasks in a single insert / delete step. These ideas are well-known for task scheduling without priorities; our theoretical contribution is showing that, despite relaxations, this design can still provide rank guarantees, which in turn implies bounds on total work performed. We provide a general SMQ implementation that can surpass state-of-the-art schedulers such as Galois and PMOD in terms of performance on popular graph-processing benchmarks. Notably, the performance improvement comes mainly from the superior rank guarantees provided by our scheduler, confirming that analytically-reasoned approaches can still provide performance improvements for priority task scheduling.
翻译:设计和实施高效平行优先排程是一个活跃的研究领域。一个令人感兴趣的拟议设计是多队列 : 给 $n 线条和 $m\ge n$ n$ 不同的优先队列, 任务插入会以随机方式统一执行, 而要删除, 线条会以随机方式选择两个队列, 并删除所观察到的更高优先的任务。 这个方法规模很好, 并且具有概率等级保障: 相对于所有其他队列中剩余任务而言, 大致删除了每个任务级别, 相对其他队列中剩余任务的级别, 正在期待的是 $O( m) 。 然而, 这个模式的性能低于精心设计的排程表的性能, 从而避免理论保证实际效率。 我们研究的是, 是否设计和实施一个基于多队列的任务排队列的排队列, 我们提出了一个新的变量, 叫做“ 偷窃多队列( SM) ” (SM) (SM) (SM) (SQ) (SQ) (一个缓存有效变数的变数变量), 既可以使用“ 预估” 方法,, 也能够提供“ 快速” (Plickalal lad) (s) (s) (s) (s) lading) (s) (s) (s) (s) (每个队列) lad) lad) lad) (每个队列) lad) (每个队列的更 ) (一个地方排, ) (通常程, ) ) (通常列的排, ),, ),, 而通常列通常列通常列通常列通常列通常列通常列会取消, 通常为取消,, 。