了解挑战和协助开发者开发火花应用 (Understanding the Challenges and Assisting Developers with Developing Spark Applications) - 专知论文

会员服务 ·

0

可理解性 · Processing（编程语言） · Performer · Spark · INFORMS ·

2021 年 3 月 25 日

Understanding the Challenges and Assisting Developers with Developing Spark Applications

翻译：了解挑战和协助开发者开发火花应用

To process data more efficiently, big data frameworks provide data abstractions to developers. However, due to the abstraction, there may be many challenges for developers to understand and debug the data processing code. To uncover the challenges in using big data frameworks, we first conduct an empirical study on 1,000 Apache Spark-related questions on Stack Overflow. We find that most of the challenges are related to data transformation and API usage. To solve these challenges, we design an approach, which assists developers with understanding and debugging data processing in Spark. Our approach leverages statistical sampling to minimize performance overhead, and provides intermediate information and hint messages for each data processing step of a chained method pipeline. The preliminary evaluation of our approach shows that it has low performance overhead and we receive good feedback from developers.

翻译：为了更高效地处理数据,大数据框架为开发者提供数据抽象数据,然而,由于抽象化,开发者可能面临许多挑战来理解和调试数据处理代码。为了发现使用大数据框架的挑战,我们首先对Stack overflow的1,000 Apache Spark相关问题进行实证研究。我们发现,大部分挑战都与数据转换和使用API有关。为了应对这些挑战,我们设计了一种方法,帮助开发者理解和调试斯帕克的数据处理。我们的方法利用统计抽样来尽量减少业绩管理费,并为链式方法管道的每个数据处理步骤提供中间信息和提示信息。我们对方法的初步评估表明,它的业绩管理费很低,我们从开发者那里得到了良好的反馈。

0

相关内容

可理解性

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

197+阅读 · 2020年2月1日

【Manning干货书】机器学习系统可扩展规模设计（MLS），224页PDF

【Manning干货书】机器学习系统可扩展规模设计（MLS），224页PDF

专知会员服务

76+阅读 · 2020年1月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Advances and Challenges in Conversational Recommender Systems: A Survey

Arxiv

14+阅读 · 2021年1月23日

Advances in Natural Language Question Answering: A Review

Advances in Natural Language Question Answering: A Review

Arxiv

5+阅读 · 2019年4月10日

Incremental Reading for Question Answering

Incremental Reading for Question Answering

Arxiv

5+阅读 · 2019年1月15日

Nocaps: novel object captioning at scale

Nocaps: novel object captioning at scale

Arxiv

6+阅读 · 2018年12月20日

A Probe into Understanding GAN and VAE models

A Probe into Understanding GAN and VAE models

Arxiv

9+阅读 · 2018年12月13日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Stylistic Variation in Social Media Part-of-Speech Tagging

Arxiv

4+阅读 · 2018年4月19日

Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

Arxiv

8+阅读 · 2018年3月29日

The Web as a Knowledge-base for Answering Complex Questions

Arxiv

5+阅读 · 2018年3月18日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

197+阅读 · 2020年2月1日

【Manning干货书】机器学习系统可扩展规模设计（MLS），224页PDF

【Manning干货书】机器学习系统可扩展规模设计（MLS），224页PDF

专知会员服务

76+阅读 · 2020年1月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

基于大型语言模型的网络威胁情报：利用LLM提取MITRE ATT&CK技术 | 最新文献

无人机（UAV）战略：区域大国与暴力非国家行为体在中东冲突中对无人机的运用 | 130页

神经技术与未来无人机战争的交汇点 | 最新报告

美国从“蛛网行动”中汲取轰炸机舰队保护教训

相关资讯

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Advances and Challenges in Conversational Recommender Systems: A Survey

Arxiv

14+阅读 · 2021年1月23日

Advances in Natural Language Question Answering: A Review

Advances in Natural Language Question Answering: A Review

Arxiv

5+阅读 · 2019年4月10日

Incremental Reading for Question Answering

Incremental Reading for Question Answering

Arxiv

5+阅读 · 2019年1月15日

Nocaps: novel object captioning at scale

Nocaps: novel object captioning at scale

Arxiv

6+阅读 · 2018年12月20日

A Probe into Understanding GAN and VAE models

A Probe into Understanding GAN and VAE models

Arxiv

9+阅读 · 2018年12月13日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Stylistic Variation in Social Media Part-of-Speech Tagging

Arxiv

4+阅读 · 2018年4月19日

Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

Arxiv

8+阅读 · 2018年3月29日

The Web as a Knowledge-base for Answering Complex Questions

Arxiv

5+阅读 · 2018年3月18日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

微信扫码咨询专知VIP会员