大数据处理程序摘要视图 (An Abstract View of Big Data Processing Programs) - 专知论文

会员服务 ·

0

Processing（编程语言） · MoDELS · 大数据 · Apache Flink · Apache ·

2021 年 8 月 5 日

An Abstract View of Big Data Processing Programs

翻译：大数据处理程序摘要视图

Joao Batista de Souza Neto,Anamaria Martins Moreira,Genoveva Vargas-Solar,Martin A. Musicante

from arxiv, This is an extended version of Modeling Big Data Processing Programs, by Joao Batista de Souza Neto, Anamaria Martins Moreira, Genoveva Vargas-Solar and Martin A. Musicante. SBMF 2020

This paper proposes a model for specifying data flow based parallel data processing programs agnostic of target Big Data processing frameworks. The paper focuses on the formal abstract specification of non-iterative and iterative programs, generalizing the strategies adopted by data flow Big Data processing frameworks. The proposed model relies on monoid AlgebraandPetri Netstoabstract Big Data processing programs in two levels: a high level representing the program data flow and a lower level representing data transformation operations (e.g., filtering, aggregation, join). We extend the model for data processing programs proposed in [1], to enable the use of iterative programs. The general specification of iterative data processing programs implemented by data flow-based parallel programming models is essential given the democratization of iterative and greedy Big Data analytics algorithms. Indeed, these algorithms call for revisiting parallel programming models to express iterations. The paper gives a comparative analysis of the iteration strategies proposed byApache Spark, DryadLINQ, Apache Beam and Apache Flink. It discusses how the model achieves to generalize these strategies.

翻译：本文提出一个模式,用于说明基于数据流的平行数据处理程序,对目标大数据处理框架进行不可知性。本文侧重于非模拟和迭代程序的正式抽象规格,概括数据流大数据处理框架采用的战略。拟议的模型依赖单级的 AlgebraandPetri Netstobtstrap 大型数据处理程序,分为两个层次:一个代表程序数据流的高层次,另一个代表数据转换操作的较低层次(例如过滤、汇总、合并)。我们扩展了[1]中提议的数据处理程序模型,以便能够使用迭接程序。鉴于反复和贪婪的大数据分析算法的民主化,由数据流平行程序模型执行的迭代数据处理程序的一般规格至关重要。事实上,这些算法要求重新审查平行的编程模型,以表达迭代。本文对Apache Spark、DryadLINQ、Apache Beam和Apache Flink提出的迭接战略进行了比较分析。它讨论了模型如何将这些战略概括化。

0

相关内容

Processing（编程语言）

Processing（编程语言）

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

最新《数据科学：全面综述论文》42页pdf，Data Science: A Comprehensive Overview

最新《数据科学：全面综述论文》42页pdf，Data Science: A Comprehensive Overview

专知会员服务

316+阅读 · 2020年7月9日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【报告推荐】三维及超几何处理中的几何与数据学习（Geometry and Learning from Data in 3D and Beyond - Geometric Processing ）

【报告推荐】三维及超几何处理中的几何与数据学习（Geometry and Learning from Data in 3D and Beyond - Geometric Processing ）

专知会员服务

12+阅读 · 2019年11月10日

【电子书】现代大数据算法（Modern Big Data Algorithms）52页PDF免费下载

【电子书】现代大数据算法（Modern Big Data Algorithms）52页PDF免费下载

专知会员服务

23+阅读 · 2019年11月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

3+阅读 · 2019年4月19日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

Deductive Verification of Programs with Underspecified Semantics by Model Extraction

Arxiv

0+阅读 · 2021年10月5日

Scalable Relational Query Processing on Big Matrix Data

Arxiv

0+阅读 · 2021年10月5日

Cuttlefish: Library for Achieving Energy Efficiency in Multicore Parallel Programs

Arxiv

0+阅读 · 2021年10月1日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

Deep Learning in Video Multi-Object Tracking: A Survey

Deep Learning in Video Multi-Object Tracking: A Survey

Arxiv

58+阅读 · 2019年7月31日

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Arxiv

5+阅读 · 2019年6月18日

BigDL: A Distributed Deep Learning Framework for Big Data

Arxiv

4+阅读 · 2018年4月16日

Parallel Tracking and Verifying

Arxiv

8+阅读 · 2018年1月30日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

Big Data: Understanding Big Data

Arxiv

6+阅读 · 2016年1月15日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

最新《数据科学：全面综述论文》42页pdf，Data Science: A Comprehensive Overview

最新《数据科学：全面综述论文》42页pdf，Data Science: A Comprehensive Overview

专知会员服务

316+阅读 · 2020年7月9日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【报告推荐】三维及超几何处理中的几何与数据学习（Geometry and Learning from Data in 3D and Beyond - Geometric Processing ）

【报告推荐】三维及超几何处理中的几何与数据学习（Geometry and Learning from Data in 3D and Beyond - Geometric Processing ）

专知会员服务

12+阅读 · 2019年11月10日

【电子书】现代大数据算法（Modern Big Data Algorithms）52页PDF免费下载

【电子书】现代大数据算法（Modern Big Data Algorithms）52页PDF免费下载

专知会员服务

23+阅读 · 2019年11月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

卫星导航技术发展综述

《美军"僚机"联合能力技术演示项目：有人-无人火炮作战》41页报告

美军条令《火力指挥》116页

可解释的人工智能在生物医学图像分析中的应用综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

3+阅读 · 2019年4月19日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

相关论文

Deductive Verification of Programs with Underspecified Semantics by Model Extraction

Arxiv

0+阅读 · 2021年10月5日

Scalable Relational Query Processing on Big Matrix Data

Arxiv

0+阅读 · 2021年10月5日

Cuttlefish: Library for Achieving Energy Efficiency in Multicore Parallel Programs

Arxiv

0+阅读 · 2021年10月1日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

Deep Learning in Video Multi-Object Tracking: A Survey

Deep Learning in Video Multi-Object Tracking: A Survey

Arxiv

58+阅读 · 2019年7月31日

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Arxiv

5+阅读 · 2019年6月18日

BigDL: A Distributed Deep Learning Framework for Big Data

Arxiv

4+阅读 · 2018年4月16日

Parallel Tracking and Verifying

Arxiv

8+阅读 · 2018年1月30日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

Big Data: Understanding Big Data

Arxiv

6+阅读 · 2016年1月15日

微信扫码咨询专知VIP会员