We present an efficient, parametric modeling framework for predictive resource allocations, focusing on the amount of computational resources, that can optimize for a range of price-performance objectives for data analytics in serverless query processing settings. We discuss and evaluate in depth how our system, AutoExecutor, can use this framework to automatically select near-optimal executor and core counts for Spark SQL queries running on Azure Synapse. Our techniques improve upon Spark's in-built, reactive, dynamic executor allocation capabilities by substantially reducing the total executors allocated and executor occupancy while running queries, thereby freeing up executors that can potentially be used by other concurrent queries or in reducing the overall cluster provisioning needs. In contrast with post-execution analysis tools such as Sparklens, we predict resource allocations for queries before executing them and can also account for changes in input data sizes for predicting the desired allocations.
翻译:我们为预测资源分配提出了一个高效的参数模型框架,重点是计算资源的数量,可以优化服务器无查询处理环境中数据分析的一系列价格-性能目标。我们深入讨论和评估我们的系统AutoExcutor如何利用这个框架自动选择在Azure Synapse上运行的Spark SQL查询的近最佳执行人和核心计数。我们的技术改进了Spark的内建、反应性和动态执行人分配能力,大大减少了在查询中分配的总执行人和执行人占用的时间,从而释放了可能被其他同时询问使用或用于减少总体集群提供需要的执行人。与Sparkleens等执行后分析工具相比,我们预测了执行前查询的资源分配情况,还可以说明投入数据大小的变化,以预测预期所需的分配情况。