多样性/分配制度与剩余分配制之间的平衡 (Diversity/Parallelism Trade-off in Distributed Systems with Redundancy) - 专知论文

会员服务 ·

0

可约的 · 优化器 · 可辨认的 · 缩放 · 多样性 ·

2021 年 12 月 18 日

Diversity/Parallelism Trade-off in Distributed Systems with Redundancy

翻译：多样性/分配制度与剩余分配制之间的平衡

Pei Peng,Emina Soljanin,Philip Whiting

from arxiv, Two columns version, corrected typos

As numerous machine learning and other algorithms increase in complexity and data requirements, distributed computing becomes necessary to satisfy the growing computational and storage demands, because it enables parallel execution of smaller tasks that make up a large computing job. However, random fluctuations in task service times lead to straggling tasks with long execution times. Redundancy, in the form of task replication and erasure coding, provides diversity that allows a job to be completed when only a subset of redundant tasks is executed, thus removing the dependency on the straggling tasks. In situations of constrained resources (here a fixed number of parallel servers), increasing redundancy reduces the available resources for parallelism. In this paper, we characterize the diversity vs. parallelism trade-off and identify the optimal strategy, among replication, coding and splitting, which minimizes the expected job completion time. We consider three common service time distributions and establish three models that describe scaling of these distributions with the task size. We find that different distributions with different scaling models operate optimally at different levels of redundancy, and thus may require very different code rates.

翻译：随着许多机器学习和其他算法的复杂程度和数据要求的增加,为了满足不断增长的计算和存储需求,分配计算变得必要,因为它能够平行地执行构成大量计算工作的较小任务。然而,任务服务时间的随机波动导致任务执行时间过长。任务复制和删除编码等形式的裁员提供了多样性,允许在只完成一部分冗余任务时完成一份工作,从而消除了对任务支离破碎的依赖。在资源有限的情况下(这里有固定数量的平行服务器),增加冗余减少了平行工作的可用资源。在本文中,我们描述多样性与平行交易的对比,并确定最佳战略,包括复制、编码和分离,从而最大限度地减少预期的工作完成时间。我们考虑三个共同服务时间分配,建立三个模型,说明按任务规模扩大这些分配的规模。我们发现,不同规模的分布不同分布在不同的冗余程度上运作,因此可能需要非常不同的代码率。

0

相关内容

可约的

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

已删除

将门创投

5+阅读 · 2019年8月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

开发者应当了解的18套机器学习平台

开发者应当了解的18套机器学习平台

深度学习世界

5+阅读 · 2018年8月14日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

The Tiny-Tasks Granularity Trade-Off: Balancing overhead vs. performance in parallel systems

Arxiv

0+阅读 · 2022年2月23日

Performance Modeling of Metric-Based Serverless Computing Platforms

Arxiv

0+阅读 · 2022年2月23日

Multi-item Non-truthful Auctions Achieve Good Revenue

Arxiv

0+阅读 · 2022年2月22日

Byzantine-Resilient Federated Learning with Heterogeneous Data Distribution

Arxiv

0+阅读 · 2022年2月21日

Improving AoI via Learning-based Distributed MAC in Wireless Networks

Improving AoI via Learning-based Distributed MAC in Wireless Networks

Arxiv

0+阅读 · 2022年2月18日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning

Arxiv

4+阅读 · 2018年1月11日

Efficient Parallel Translating Embedding For Knowledge Graphs

Arxiv

9+阅读 · 2018年1月9日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

相关VIP内容

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

已删除

将门创投

5+阅读 · 2019年8月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

开发者应当了解的18套机器学习平台

开发者应当了解的18套机器学习平台

深度学习世界

5+阅读 · 2018年8月14日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

The Tiny-Tasks Granularity Trade-Off: Balancing overhead vs. performance in parallel systems

Arxiv

0+阅读 · 2022年2月23日

Performance Modeling of Metric-Based Serverless Computing Platforms

Arxiv

0+阅读 · 2022年2月23日

Multi-item Non-truthful Auctions Achieve Good Revenue

Arxiv

0+阅读 · 2022年2月22日

Byzantine-Resilient Federated Learning with Heterogeneous Data Distribution

Arxiv

0+阅读 · 2022年2月21日

Improving AoI via Learning-based Distributed MAC in Wireless Networks

Improving AoI via Learning-based Distributed MAC in Wireless Networks

Arxiv

0+阅读 · 2022年2月18日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning

Arxiv

4+阅读 · 2018年1月11日

Efficient Parallel Translating Embedding For Knowledge Graphs

Arxiv

9+阅读 · 2018年1月9日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员