管道:诊断和消除机器学习数据管道中的性能瓶颈 (Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines) - 专知论文

会员服务 ·

0

Performer · Machine Learning · ML · Extensibility · Automator ·

2021 年 11 月 7 日

Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines

翻译：管道:诊断和消除机器学习数据管道中的性能瓶颈

Michael Kuchnik,Ana Klimovic,Jiri Simsa,George Amvrosiadis,Virginia Smith

Input pipelines, which ingest and transform input data, are an essential part of training Machine Learning (ML) models. However, it is challenging to implement efficient input pipelines, as it requires reasoning about parallelism, asynchrony, and variability in fine-grained profiling information. Our analysis of over 2 million ML jobs in Google datacenters reveals that a significant fraction of model training jobs could benefit from faster input data pipelines. At the same time, our analysis reveals that most jobs do not saturate host hardware, pointing in the direction of software-based bottlenecks. Motivated by these findings, we propose Plumber, a tool for finding bottlenecks in ML input pipelines. Plumber uses an extensible and interprettable operational analysis analytical model to automatically tune parallelism, prefetching, and caching under host resource constraints. Across five representative ML pipelines, Plumber obtains speedups of up to 46x for misconfigured pipelines. By automating caching, Plumber obtains end-to-end speedups of over 40% compared to state-of-the-art tuners.

翻译：对谷歌数据中心200多万个ML职位的分析显示,很大一部分示范培训工作可以受益于更快的输入数据管道。与此同时,我们的分析表明,大多数工作并不饱和主机硬件,指向基于软件的瓶颈点。根据这些发现,我们建议Plumber,这是寻找ML输入管道瓶颈的一个工具。Plumber使用一种可扩展和可解释的操作分析分析模型,自动调节平行、预展和在主机资源限制下缓冲。在5个具有代表性的ML管道中,Plumber获得46x的错配置管道加速器。通过自动加固,Plumber获得40%以上的终端至终端加速器。

0

相关内容

Performer

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

308+阅读 · 2020年2月26日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【电子书】机器学习实战（Machine Learning in Action），附PDF

【电子书】机器学习实战（Machine Learning in Action），附PDF

专知会员服务

130+阅读 · 2019年11月25日

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

专知会员服务

65+阅读 · 2019年10月27日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

R工程化—Rest API 之plumber包

R工程化—Rest API 之plumber包

R语言中文社区

11+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

大数据分析研究组开源Easy Machine Learning系统

大数据分析研究组开源Easy Machine Learning系统

中国科学院网络数据重点实验室

17+阅读 · 2017年6月13日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

ColO-RAN: Developing Machine Learning-based xApps for Open RAN Closed-loop Control on Programmable Experimental Platforms

Arxiv

0+阅读 · 2022年1月12日

A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs

Arxiv

0+阅读 · 2022年1月10日

GECKO: Reconciling Privacy, Accuracy and Efficiency in Embedded Deep Learning

Arxiv

0+阅读 · 2022年1月9日

Machines and Influence

Arxiv

0+阅读 · 2022年1月8日

CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

Arxiv

0+阅读 · 2022年1月5日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

End to end learning and optimization on graphs

Arxiv

7+阅读 · 2019年5月31日

NeuroNet: Fast and Robust Reproduction of Multiple Brain Image Segmentation Pipelines

Arxiv

5+阅读 · 2018年6月11日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

308+阅读 · 2020年2月26日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【电子书】机器学习实战（Machine Learning in Action），附PDF

【电子书】机器学习实战（Machine Learning in Action），附PDF

专知会员服务

130+阅读 · 2019年11月25日

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

专知会员服务

65+阅读 · 2019年10月27日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能行业：2027年AI预测报告

70页pdf《视觉-语言-动作模型综述：一种基于动作离散化的视角》

训练扩散模型其实比你想象的更简单！何恺明团队新作Dispersive Loss：给扩散模型加正则化

【ICML2025】用于可扩展持续强化学习的自组合策略

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

R工程化—Rest API 之plumber包

R工程化—Rest API 之plumber包

R语言中文社区

11+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

大数据分析研究组开源Easy Machine Learning系统

大数据分析研究组开源Easy Machine Learning系统

中国科学院网络数据重点实验室

17+阅读 · 2017年6月13日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

相关论文

ColO-RAN: Developing Machine Learning-based xApps for Open RAN Closed-loop Control on Programmable Experimental Platforms

Arxiv

0+阅读 · 2022年1月12日

A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs

Arxiv

0+阅读 · 2022年1月10日

GECKO: Reconciling Privacy, Accuracy and Efficiency in Embedded Deep Learning

Arxiv

0+阅读 · 2022年1月9日

Machines and Influence

Arxiv

0+阅读 · 2022年1月8日

CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

Arxiv

0+阅读 · 2022年1月5日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

End to end learning and optimization on graphs

Arxiv

7+阅读 · 2019年5月31日

NeuroNet: Fast and Robust Reproduction of Multiple Brain Image Segmentation Pipelines

Arxiv

5+阅读 · 2018年6月11日

微信扫码咨询专知VIP会员