LoROCAT: Spark SQL 应用软件的低管理在线配置自动自动调试 (LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications) - 专知论文

会员服务 ·

0

Spark SQL · 簇 · Spark · Performer · 优化器 ·

2022 年 4 月 4 日

LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications

翻译：LoROCAT: Spark SQL 应用软件的低管理在线配置自动自动调试

Jinhan Xin,Kai Hwang,Zhibin Yu

from arxiv, 16 pages, 21 figures. Accepted by SIGMOD'2022 but not published. This arxiv version is an extended version, allowed by conference chairs

Spark SQL has been widely deployed in industry but it is challenging to tune its performance. Recent studies try to employ machine learning (ML) to solve this problem, but suffer from two drawbacks. First, it takes a long time (high overhead) to collect training samples. Second, the optimal configuration for one input data size of the same application might not be optimal for others. To address these issues, we propose a novel Bayesian Optimization (BO) based approach named LOCAT to automatically tune the configurations of Spark SQL applications online. LOCAT innovates three techniques. The first technique, named QCSA, eliminates the configuration-insensitive queries by Query Configuration Sensitivity Analysis (QCSA) when collecting training samples. The second technique, dubbed DAGP, is a Datasize-Aware Gaussian Process (DAGP) which models the performance of an application as a distribution of functions of configuration parameters as well as input data size. The third technique, called IICP, Identifies Important Configuration Parameters (IICP) with respect to performance and only tunes the important ones. As such, LOCAT can tune the configurations of a Spark SQL application with low overhead and adapt to different input data sizes. We employ Spark SQL applications from benchmark suites TPC-DS, TPC-H, and HiBench running on two significantly different clusters, a four-node ARM cluster and an eight-node x86 cluster, to evaluate LOCAT. The experimental results on the ARM cluster show that LOCAT accelerates the optimization procedures of the state-of-the-art approaches by at least 4.1x and up to 9.7x; moreover, LOCAT improves the application performance by at least 1.9x and up to 2.4x. On the x86 cluster, LOCAT shows similar results to those on the ARM cluster.

翻译：Spark SQL 已经在行业中广泛部署 SQL 。最近的研究试图利用机器学习(ML) 解决这个问题,但有两个缺点。首先, 收集培训样本需要很长的时间( 高管理) 。第二, 同一应用程序的一个输入数据大小的最佳配置可能不是其他应用程序的最佳配置。为了解决这些问题, 我们建议采用名为 LOCAT (BO) 的新型Bayesian Optim化(BO) 方法, 自动调整 Spark SQL 应用程序的配置。 LOCAT 创新了三种技术。第一种技术, 名为 QCSA (QCSA), 消除了Query 配置敏感度分析(QCSA) 在收集培训样本时的配置不敏感度查询。第二种技术, 调制DGP( ), 是一个数据缩略图- Award Gauss 进程(DGP), 将应用程序的性能作为配置参数的分布以及输入数据大小。第三个技术, 名为 IICP, 识别重要配置参数(IICP), 有关业绩, 只标定了TROC- RodL 程序, 运行S- RDS 。

1

相关内容

Spark SQL

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

98+阅读 · 2019年12月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于包簇映射的云数据中心资源管理模型构建及其若干关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向大数据的高时效并行计算机系统结构与技术

国家自然科学基金

0+阅读 · 2014年12月31日

基于人工智能的矿山技术经济指标动态优化研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于多维RFID大数据的工业物联网智能车间物流优化方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

产业集群风险传导与扩散及其预警机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于NIC的Exascale级计算机聚合通信卸载关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

高能物理数据分析的Hadoop/HBASE平台研究

国家自然科学基金

1+阅读 · 2012年12月31日

容错存储系统的扩容问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

并行环境下数字地形分析的粒度模型与容错调度机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

云计算数据中心高可用理论与方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms

Arxiv

0+阅读 · 2022年4月20日

SnapFuzz: An Efficient Fuzzing Framework for Network Applications

Arxiv

0+阅读 · 2022年4月19日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

Separating Rule Discovery and Global Solution Composition in a Learning Classifier System

Arxiv

0+阅读 · 2022年4月18日

Online RIS Configuration Learning for Arbitrary Large Numbers of $1$-Bit Phase Resolution Elements

Arxiv

0+阅读 · 2022年4月18日

Boson sampling cannot be faithfully simulated by only the lower-order multi-boson interferences

Arxiv

0+阅读 · 2022年4月16日

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Arxiv

0+阅读 · 2022年4月15日

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

Arxiv

1+阅读 · 2022年4月15日

Multi-fidelity data fusion through parameter space reduction with applications to automotive engineering

Arxiv

0+阅读 · 2022年4月14日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

98+阅读 · 2019年12月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《科研智能：人工智能赋能工业仿真研究报告（2025年）》

具身智能中的世界模型：全面综述

【NeurIPS2025】迈向开放世界的三维“物体性”学习

【博士论文】用于排序与扩散模型的安全、高效与鲁棒强化学习

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms

Arxiv

0+阅读 · 2022年4月20日

SnapFuzz: An Efficient Fuzzing Framework for Network Applications

Arxiv

0+阅读 · 2022年4月19日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

Separating Rule Discovery and Global Solution Composition in a Learning Classifier System

Arxiv

0+阅读 · 2022年4月18日

Online RIS Configuration Learning for Arbitrary Large Numbers of $1$-Bit Phase Resolution Elements

Arxiv

0+阅读 · 2022年4月18日

Boson sampling cannot be faithfully simulated by only the lower-order multi-boson interferences

Arxiv

0+阅读 · 2022年4月16日

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Arxiv

0+阅读 · 2022年4月15日

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

Arxiv

1+阅读 · 2022年4月15日

Multi-fidelity data fusion through parameter space reduction with applications to automotive engineering

Arxiv

0+阅读 · 2022年4月14日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

相关基金

基于包簇映射的云数据中心资源管理模型构建及其若干关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向大数据的高时效并行计算机系统结构与技术

国家自然科学基金

0+阅读 · 2014年12月31日

基于人工智能的矿山技术经济指标动态优化研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于多维RFID大数据的工业物联网智能车间物流优化方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

产业集群风险传导与扩散及其预警机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于NIC的Exascale级计算机聚合通信卸载关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

高能物理数据分析的Hadoop/HBASE平台研究

国家自然科学基金

1+阅读 · 2012年12月31日

容错存储系统的扩容问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

并行环境下数字地形分析的粒度模型与容错调度机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

云计算数据中心高可用理论与方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员