SODA: 利用混合方案分析数据密集应用的语义-软件优化框架 (SODA: A Semantics-Aware Optimization Framework for Data-Intensive Applications Using Hybrid Program Analysis)

In the era of data explosion, a growing number of data-intensive computing frameworks, such as Apache Hadoop and Spark, have been proposed to handle the massive volume of unstructured data in parallel. Since programming models provided by these frameworks allow users to specify complex and diversified user-defined functions (UDFs) with predefined operations, the grand challenge of tuning up entire system performance arises if programmers do not fully understand the semantics of code, data, and runtime systems. In this paper, we design a holistic semantics-aware optimization for data-intensive applications using hybrid program analysis} (SODA) to assist programmers to tune performance issues. SODA is a two-phase framework: the offline phase is a static analysis that analyzes code and performance profiling data from the online phase of prior executions to generate a parameterized and instrumented application; the online phase is a dynamic analysis that keeps track of the application's execution and collects runtime information of data and system. Extensive experimental results on four real-world Spark applications show that SODA can gain up to 60%, 10%, 8%, faster than its original implementation, with the three proposed optimization strategies, i.e., cache management, operation reordering, and element pruning, respectively.

翻译：在数据爆炸时代,越来越多的数据密集型计算框架,如Apache Hadoop和Spark,被提议并行处理大量非结构化数据。由于这些框架提供的编程模型使用户能够指定复杂和多样化的用户定义功能(UDF),并预先界定操作,如果程序设计者不完全理解代码、数据和运行时间系统的语义,则整个系统性能调整的巨大挑战就会产生。在本文中,我们设计了一个数据密集型应用的整体语义系统优化,使用混合程序分析}(SODA)协助程序设计员调和业绩问题。SODA是一个两阶段框架:离线阶段是一个静态分析阶段,分析前处决在线阶段的代码和性能特征分析数据,以产生参数化和仪器化应用程序;在线阶段是一个动态分析,以跟踪应用程序的执行并收集数据和系统的运行时间信息。四个现实世界Spark应用的广泛实验结果显示,SODADA可以达到60%、10%、8%、8%、比最初实施速度,并分别使用三种优化战略。

相关内容

SODA

关注 0

本专题讨论会主要讨论离散问题之有效演算法与资料结构。除了这些方法和结构的设计，还包括它们的使用、性能分析以及与它们的发展或局限性相关的数学问题。性能分析可以是分析性的，也可以是实验性的，可以是针对最坏情况或预期情况的性能。研究可以是理论性的，也可以是基于实践中出现的数据集，可以解决绩效分析中涉及的方法学问题。官网链接：https://www.siam.org/conferences/cm/conference/soda20

【北京大学】动态异构图神经网络建模情感，Jointly Modeling Aspect and Sentiment with Dynamic Heterogeneous Graph Neural Networks

专知会员服务

55+阅读 · 2020年4月15日

【硬核书】金融数学C++编程，411页pdf，C++ for Financial Mathematics

专知会员服务

75+阅读 · 2020年4月6日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日