分批和分流工作量的系统认知动态分隔 (System-aware dynamic partitioning for batch and streaming workloads) - 专知论文

会员服务 ·

0

流 · Processing（编程语言） · 划分 · Spark · Stream Processing ·

2021 年 5 月 31 日

System-aware dynamic partitioning for batch and streaming workloads

翻译：分批和分流工作量的系统认知动态分隔

Zoltán Zvara,Péter G. N. Szabó,Balázs Barnabás Lóránt,András A. Benczúr

from arxiv, 14 pages, 8 figures

When processing data streams with highly skewed and nonstationary key distributions, we often observe overloaded partitions when the hash partitioning fails to balance data correctly. To avoid slow tasks that delay the completion of the whole stage of computation, it is necessary to apply adaptive, on-the-fly partitioning that continuously recomputes an optimal partitioner, given the observed key distribution. While such solutions exist for batch processing of static data sets and stateless stream processing, the task is difficult for long-running stateful streaming jobs where key distribution changes over time. Careful checkpointing and operator state migration is necessary to change the partitioning while the operation is running. Our key result is a lightweight on-the-fly Dynamic Repartitioning (DR) module for distributed data processing systems (DDPS), including Apache Spark and Flink, which improves the performance with negligible overhead. DR can adaptively repartition data during execution using our Key Isolator Partitioner (KIP). In our experiments with real workloads and power-law distributions, we reach a speedup of 1.5-6 for a variety of Spark and Flink jobs.

翻译：当处理高度偏斜和非静止密钥分布的数据流时,我们常常在散列分割区无法正确平衡数据时看到超载的分割区。为了避免延缓完成整个计算阶段的缓慢任务,有必要应用适应性、即时分割区, 不断重新计算最佳的分隔区, 考虑到观察到的密钥分布。虽然这些解决方案存在于静态数据集的批量处理和无国籍流处理中, 但是在关键分布随时间变化而变化的长期状态流中, 任务很难完成。仔细的检查和操作员在运行期间必须进行迁移, 才能改变分割区。我们的关键结果是对分布式数据处理系统( DDPS), 包括 Apache Spark 和 Flink 进行轻量的实时动态分割( DDPS ) 模块, 以微小的间接分配提高性能。 DR 在使用我们的 Key 离子分割区( KIP) ( KIP) 执行时, 能够适应性再分配数据。在实际工作量和权力法分布的实验中, 我们为各种 Spark 和 Flink 工作的速度加快1.5-6 。

0

相关内容

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【实用书】流数据处理，Streaming Data，219页pdf

【实用书】流数据处理，Streaming Data，219页pdf

专知会员服务

77+阅读 · 2020年4月24日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

97+阅读 · 2019年12月4日

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

专知会员服务

10+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

已删除

将门创投

4+阅读 · 2019年4月1日

Overcoming Model Bias for Robust Offline Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年7月22日

Dynamic Cantor Derivative Logic

Arxiv

0+阅读 · 2021年7月21日

Dynamic RF Combining for Multi-Antenna Ambient Energy Harvesting

Arxiv

0+阅读 · 2021年7月21日

Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs

Arxiv

0+阅读 · 2021年7月21日

Optimal Operation of Power Systems with Energy Storage under Uncertainty: A Scenario-based Method with Strategic Sampling

Optimal Operation of Power Systems with Energy Storage under Uncertainty: A Scenario-based Method with Strategic Sampling

Arxiv

0+阅读 · 2021年7月21日

StreamBlocks: A compiler for heterogeneous dataflow computing (technical report)

Arxiv

0+阅读 · 2021年7月20日

Root Repulsion and Faster Solving for Very Sparse Polynomials Over $p$-adic Fields

Arxiv

0+阅读 · 2021年7月19日

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Arxiv

3+阅读 · 2021年3月5日

Dynamic Transfer Learning for Named Entity Recognition

Dynamic Transfer Learning for Named Entity Recognition

Arxiv

3+阅读 · 2018年12月13日

Paraphrase Generation with Deep Reinforcement Learning

Paraphrase Generation with Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年8月23日

VIP会员

文章信息

相关主题

Processing（编程语言）

Stream Processing

相关VIP内容

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【实用书】流数据处理，Streaming Data，219页pdf

【实用书】流数据处理，Streaming Data，219页pdf

专知会员服务

77+阅读 · 2020年4月24日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

97+阅读 · 2019年12月4日

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

专知会员服务

10+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

已删除

将门创投

4+阅读 · 2019年4月1日

相关论文

Overcoming Model Bias for Robust Offline Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年7月22日

Dynamic Cantor Derivative Logic

Arxiv

0+阅读 · 2021年7月21日

Dynamic RF Combining for Multi-Antenna Ambient Energy Harvesting

Arxiv

0+阅读 · 2021年7月21日

Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs

Arxiv

0+阅读 · 2021年7月21日

Optimal Operation of Power Systems with Energy Storage under Uncertainty: A Scenario-based Method with Strategic Sampling

Optimal Operation of Power Systems with Energy Storage under Uncertainty: A Scenario-based Method with Strategic Sampling

Arxiv

0+阅读 · 2021年7月21日

StreamBlocks: A compiler for heterogeneous dataflow computing (technical report)

Arxiv

0+阅读 · 2021年7月20日

Root Repulsion and Faster Solving for Very Sparse Polynomials Over $p$-adic Fields

Arxiv

0+阅读 · 2021年7月19日

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Arxiv

3+阅读 · 2021年3月5日

Dynamic Transfer Learning for Named Entity Recognition

Dynamic Transfer Learning for Named Entity Recognition

Arxiv

3+阅读 · 2018年12月13日

Paraphrase Generation with Deep Reinforcement Learning

Paraphrase Generation with Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年8月23日

微信扫码咨询专知VIP会员