Hadoop 和 Spark 参数自动提款 (Auto Tuning of Hadoop and Spark parameters) - 专知论文

会员服务 ·

0

tuning · Hadoop · Spark · 随机搜索 · 网格搜索 ·

2021 年 11 月 4 日

Auto Tuning of Hadoop and Spark parameters

翻译：Hadoop 和 Spark 参数自动提款

Tanuja Patanshetti,Ashish Anil Pawar,Disha Patel,Sanket Thakare

from arxiv, 12 Pages, 9 Figures, 12 Tables, Published with International Journal of Engineering Trends and Technology (IJETT)

Data of the order of terabytes, petabytes, or beyond is known as Big Data. This data cannot be processed using the traditional database software, and hence there comes the need for Big Data Platforms. By combining the capabilities and features of various big data applications and utilities, Big Data Platforms form a single solution. It is a platform that helps to develop, deploy and manage the big data environment. Hadoop and Spark are the two open-source Big Data Platforms provided by Apache. Both these platforms have many configurational parameters, which can have unforeseen effects on the execution time, accuracy, etc. Manual tuning of these parameters can be tiresome, and hence automatic ways should be needed to tune them. After studying and analyzing various previous works in automating the tuning of these parameters, this paper proposes two algorithms - Grid Search with Finer Tuning and Controlled Random Search. The performance indicator studied in this paper is Execution Time. These algorithms help to tune the parameters automatically. Experimental results have shown a reduction in execution time of about 70% and 50% for Hadoop and 81.19% and 77.77% for Spark by Grid Search with Finer Tuning and Controlled Random Search, respectively.

翻译：此数据无法使用传统数据库软件进行处理, 因而需要使用大数据平台。通过将各种大数据应用程序和公用设施的能力和特性结合起来, 大数据平台形成一个单一的解决方案。这是一个有助于开发、部署和管理大数据环境的平台。 Hadoop 和 Spark 是阿帕奇提供的两个开放源大数据平台。这两个平台都有许多配置参数, 可能对执行时间、准确性等产生无法预见的影响。这些参数的手工调试可能很疲倦, 因此需要自动调试这些参数。在研究和分析了先前为调整这些参数而进行自动化的各种工作之后, 本文提出了两种算法- 与 Finerright 调试和控制随机搜索。本文研究的业绩指标是“ 执行时间 ” 。这些算法有助于自动调控参数。实验结果显示, Hadoop 执行时间减少约70% 和 50%, Starkin 分别减少与 Starmerning 和 Rangsearch 的执行时间, 与 Rampleg Starning 和 Resting 控制的执行时间减少约。

0

相关内容

tuning

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

98+阅读 · 2019年12月4日

《云计算发展白皮书（2019年）》，55页PDF，中国信息通信研究院编

《云计算发展白皮书（2019年）》，55页PDF，中国信息通信研究院编

专知会员服务

39+阅读 · 2019年11月7日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

PySpark和大数据处理初探

PySpark和大数据处理初探

Python程序员

7+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

学会期刊丨《中国人工智能学会通讯》2019年第9卷第04期

学会期刊丨《中国人工智能学会通讯》2019年第9卷第04期

中国人工智能学会

6+阅读 · 2019年4月30日

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

无人机

5+阅读 · 2018年10月4日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

Fast and scalable neuroevolution deep learning architecture search for multivariate anomaly detection

Fast and scalable neuroevolution deep learning architecture search for multivariate anomaly detection

Arxiv

0+阅读 · 2022年1月7日

A Framework for Energy-aware Evaluation of Distributed Data Processing Platforms in Edge-Cloud Environment

Arxiv

0+阅读 · 2022年1月6日

Parameter Prediction for Unseen Deep Architectures

Arxiv

6+阅读 · 2021年10月25日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Contrastive Neural Architecture Search with Neural Architecture Comparators

Arxiv

4+阅读 · 2021年4月6日

TCL: an ANN-to-SNN Conversion with Trainable Clipping Layers

Arxiv

3+阅读 · 2020年8月11日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

A Comparative Study on Transformer vs RNN in Speech Applications

A Comparative Study on Transformer vs RNN in Speech Applications

Arxiv

4+阅读 · 2019年9月13日

Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid Approach

Arxiv

7+阅读 · 2018年5月22日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

VIP会员

文章信息

相关主题

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

98+阅读 · 2019年12月4日

《云计算发展白皮书（2019年）》，55页PDF，中国信息通信研究院编

《云计算发展白皮书（2019年）》，55页PDF，中国信息通信研究院编

专知会员服务

39+阅读 · 2019年11月7日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

PySpark和大数据处理初探

PySpark和大数据处理初探

Python程序员

7+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

学会期刊丨《中国人工智能学会通讯》2019年第9卷第04期

学会期刊丨《中国人工智能学会通讯》2019年第9卷第04期

中国人工智能学会

6+阅读 · 2019年4月30日

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

无人机

5+阅读 · 2018年10月4日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

相关论文

Fast and scalable neuroevolution deep learning architecture search for multivariate anomaly detection

Fast and scalable neuroevolution deep learning architecture search for multivariate anomaly detection

Arxiv

0+阅读 · 2022年1月7日

A Framework for Energy-aware Evaluation of Distributed Data Processing Platforms in Edge-Cloud Environment

Arxiv

0+阅读 · 2022年1月6日

Parameter Prediction for Unseen Deep Architectures

Arxiv

6+阅读 · 2021年10月25日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Contrastive Neural Architecture Search with Neural Architecture Comparators

Arxiv

4+阅读 · 2021年4月6日

TCL: an ANN-to-SNN Conversion with Trainable Clipping Layers

Arxiv

3+阅读 · 2020年8月11日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

A Comparative Study on Transformer vs RNN in Speech Applications

A Comparative Study on Transformer vs RNN in Speech Applications

Arxiv

4+阅读 · 2019年9月13日

Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid Approach

Arxiv

7+阅读 · 2018年5月22日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

微信扫码咨询专知VIP会员