【O'Reilly TensorFlow World 2019】在NVIDIA GPU上加速训练，推理和ML应用（Accelerating training, inference, and ML applications on NVIDIA GPUs），NVIDIA，Maggie Zhang ，Nathan Luehr，Josh Romero，Pooya Davoodi，Pooya Davoodi - 专知VIP

会员服务 ·

0

Horovod · Maggie Zhang · 机器学习 · 深度学习 · 英伟达（NVIDIA） ·

2019 年 11 月 13 日

【O'Reilly TensorFlow World 2019】在NVIDIA GPU上加速训练，推理和ML应用（Accelerating training, inference, and ML applications on NVIDIA GPUs），NVIDIA，Maggie Zhang ，Nathan Luehr，Josh Romero，Pooya Davoodi，Pooya Davoodi

专知会员服务

专知，提供专业可信的知识分发服务，让认知协作更快更好！

报告主题：Accelerating training, inference, and ML applications on NVIDIA GPUs

报告摘要：此次报告中深入探讨了用于常见深度学习和机器学习负载的加速深度学习训练和推理的技术，将了解DALI如何在实际应用程序中消除I/O和数据处理瓶颈，以及自动混合精度(AMP)如何轻松地使您在Volta GPU上的训练性能提高3倍，并且能够看到使用Horovod实现多GPU和多ODE扩展的最佳实践。使用深度学习分析器来可视化TensorFlow操作并识别优化机会。将会学习在TensorRT (TRT)中使用INT8量化来部署这些训练过的模型，所有这些都在TensorFlow框架的新的方便API中进行。

邀请嘉宾：

Maggie Zhang是英伟达（NVIDIA）的一名深度学习软件工程师，工作是开发深度学习框架。她在澳大利亚新南威尔士大学获得了计算机科学和工程博士学位。研究背景包括GPU和CPU异构计算、编译器优化、计算机架构和深度学习。

Nathan Luehr是NVIDIA的高级开发技术工程师，工作是加速深度学习框架。他的背景是理论化学，并且拥有斯坦福大学的博士学位，在那里他致力于加速GPU上的电子结构计算。

Josh Romero是NVIDIA的一名开发技术工程师。他在GPU计算方面有丰富的经验，从移植和优化高性能计算(HPC)应用到最近的深度学习工作。Josh在斯坦福大学获得博士学位，他的研究重点是开发新的计算流体动力学方法，以更好地利用GPU硬件。

Pooya Davoodi是英伟达（NVIDIA）的高级软件工程师，致力于在英伟达GPU上加速TensorFlow。在此之前，Pooya曾开发过Caffe2、Caffe、CUDNN和其它CUDA库。

Davide Onofrio是英伟达（NVIDIA）高级深度学习软件技术营销工程师。他在NVIDIA专注于开发和呈现面向开发人员的深度学习技术内容。Davide作为一名计算机视觉和机器学习工程师，在生物识别、VR和汽车行业有多年的工作经验。他在米兰理工大学获得了信号处理的博士学位。

成为VIP会员查看完整内容

Accelerating training, inference, and ML applications on NVIDIA GPUs Presentation.pdf

7

相关内容

Horovod

Horovod是针对TensorFlow，Keras，PyTorch和MXNet的分布式培训框架。Horovod的目标是使分布式深度学习快速且易于使用。

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

专知会员服务

61+阅读 · 2019年12月9日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

【O'Reilly TensorFlow Conference 2019】使用TensorFlow服务的高级模型部署（Advanced model deployments with TensorFlow Serving），谷歌开发专家Hannes Hapke

【O'Reilly TensorFlow Conference 2019】使用TensorFlow服务的高级模型部署（Advanced model deployments with TensorFlow Serving），谷歌开发专家Hannes Hapke

专知会员服务

23+阅读 · 2019年11月13日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

专知会员服务

43+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】大规模构建和部署AI应用程序和系统（Building and deploying AI applications and systems at scale），O'Reilly的首席数据科学家Ben Lorica、Computable 联合创始人兼首席执行官Roger Chen

【O'Reilly AI Conference 2019】大规模构建和部署AI应用程序和系统（Building and deploying AI applications and systems at scale），O'Reilly的首席数据科学家Ben Lorica、Computable 联合创始人兼首席执行官Roger Chen

专知会员服务

25+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】人工智能用于金融时间序列预测和动态资产组合优化（AI for financial time series forecasting and dynamic assets portfolio optimization），7bulls.com的高级副总裁Konrad Wawruch

【O'Reilly AI Conference 2019】人工智能用于金融时间序列预测和动态资产组合优化（AI for financial time series forecasting and dynamic assets portfolio optimization），7bulls.com的高级副总裁Konrad Wawruch

专知会员服务

53+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】在边缘部署机器学习模型（Deploying machine learning models on the edge），Yan Zhang (Microsoft), Mathew Salvaris (Microsoft)

【O'Reilly AI Conference 2019】在边缘部署机器学习模型（Deploying machine learning models on the edge），Yan Zhang (Microsoft), Mathew Salvaris (Microsoft)

专知会员服务

19+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】使用深度学习进行异常检测以测量大型数据集的质量（Anomaly detection using deep learning to measure the quality of large datasets），BlueWhale的联合创始人兼CTO Sridhar Alla

【O'Reilly AI Conference 2019】使用深度学习进行异常检测以测量大型数据集的质量（Anomaly detection using deep learning to measure the quality of large datasets），BlueWhale的联合创始人兼CTO Sridhar Alla

专知会员服务

28+阅读 · 2019年11月5日

直播报名 | CUDA并行计算编程基础：如何利用GPU加速应用程序？

直播报名 | CUDA并行计算编程基础：如何利用GPU加速应用程序？

PaperWeekly

6+阅读 · 2019年7月15日

DLI精选课程 | 用 CUDA C/C++ 优化 GPU 显存（内文有礼）

DLI精选课程 | 用 CUDA C/C++ 优化 GPU 显存（内文有礼）

英伟达NVIDIA中国

8+阅读 · 2019年5月10日

DLI 精选课程 | 用TensorRT 优化和部署TensorFlow模型

DLI 精选课程 | 用TensorRT 优化和部署TensorFlow模型

英伟达NVIDIA中国

6+阅读 · 2019年3月8日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

NVIDIA教你用TensorRT加速深度学习推理计算 | 量子位线下沙龙笔记

NVIDIA教你用TensorRT加速深度学习推理计算 | 量子位线下沙龙笔记

量子位

8+阅读 · 2019年1月12日

斯坦福2018秋季课程大放送！深入浅出带你玩转机器学习加速（附超全资料+PPT）

斯坦福2018秋季课程大放送！深入浅出带你玩转机器学习加速（附超全资料+PPT）

新智元

6+阅读 · 2018年7月17日

报名 | GPU编程入门课程：使用CUDA C/C++进行并行计算加速

报名 | GPU编程入门课程：使用CUDA C/C++进行并行计算加速

PaperWeekly

10+阅读 · 2018年6月4日

机器学习必知的15大框架

机器学习必知的15大框架

云栖社区

16+阅读 · 2017年12月10日

谷歌发布TensorFlowLite，用半监督跨平台快速训练ML模型！

谷歌发布TensorFlowLite，用半监督跨平台快速训练ML模型！

全球人工智能

5+阅读 · 2017年11月15日

论文导读 | OpenCL版Caffe：高速跨平台机器学习框架

论文导读 | OpenCL版Caffe：高速跨平台机器学习框架

AI前线

4+阅读 · 2017年10月15日

Object detection on aerial imagery using CenterNet

Object detection on aerial imagery using CenterNet

Arxiv

6+阅读 · 2019年8月22日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Training Generative Adversarial Networks Via Turing Test

Training Generative Adversarial Networks Via Turing Test

Arxiv

3+阅读 · 2018年10月25日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Arxiv

5+阅读 · 2018年7月19日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

A Framework for Evaluating 6-DOF Object Trackers

Arxiv

6+阅读 · 2018年3月28日

Fictitious GAN: Training GANs with Historical Models

Arxiv

4+阅读 · 2018年3月23日

Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation

Arxiv

3+阅读 · 2018年3月20日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

VIP会员

相关主题

英伟达（NVIDIA）

相关VIP内容

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

【NeurIPS2019教程】深度神经网络的高效处理:从算法到硬件架构

专知会员服务

61+阅读 · 2019年12月9日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

【O'Reilly TensorFlow Conference 2019】使用TensorFlow服务的高级模型部署（Advanced model deployments with TensorFlow Serving），谷歌开发专家Hannes Hapke

【O'Reilly TensorFlow Conference 2019】使用TensorFlow服务的高级模型部署（Advanced model deployments with TensorFlow Serving），谷歌开发专家Hannes Hapke

专知会员服务

23+阅读 · 2019年11月13日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

专知会员服务

43+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】大规模构建和部署AI应用程序和系统（Building and deploying AI applications and systems at scale），O'Reilly的首席数据科学家Ben Lorica、Computable 联合创始人兼首席执行官Roger Chen

【O'Reilly AI Conference 2019】大规模构建和部署AI应用程序和系统（Building and deploying AI applications and systems at scale），O'Reilly的首席数据科学家Ben Lorica、Computable 联合创始人兼首席执行官Roger Chen

专知会员服务

25+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】人工智能用于金融时间序列预测和动态资产组合优化（AI for financial time series forecasting and dynamic assets portfolio optimization），7bulls.com的高级副总裁Konrad Wawruch

【O'Reilly AI Conference 2019】人工智能用于金融时间序列预测和动态资产组合优化（AI for financial time series forecasting and dynamic assets portfolio optimization），7bulls.com的高级副总裁Konrad Wawruch

专知会员服务

53+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】在边缘部署机器学习模型（Deploying machine learning models on the edge），Yan Zhang (Microsoft), Mathew Salvaris (Microsoft)

【O'Reilly AI Conference 2019】在边缘部署机器学习模型（Deploying machine learning models on the edge），Yan Zhang (Microsoft), Mathew Salvaris (Microsoft)

专知会员服务

19+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】使用深度学习进行异常检测以测量大型数据集的质量（Anomaly detection using deep learning to measure the quality of large datasets），BlueWhale的联合创始人兼CTO Sridhar Alla

【O'Reilly AI Conference 2019】使用深度学习进行异常检测以测量大型数据集的质量（Anomaly detection using deep learning to measure the quality of large datasets），BlueWhale的联合创始人兼CTO Sridhar Alla

专知会员服务

28+阅读 · 2019年11月5日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

直播报名 | CUDA并行计算编程基础：如何利用GPU加速应用程序？

直播报名 | CUDA并行计算编程基础：如何利用GPU加速应用程序？

PaperWeekly

6+阅读 · 2019年7月15日

DLI精选课程 | 用 CUDA C/C++ 优化 GPU 显存（内文有礼）

DLI精选课程 | 用 CUDA C/C++ 优化 GPU 显存（内文有礼）

英伟达NVIDIA中国

8+阅读 · 2019年5月10日

DLI 精选课程 | 用TensorRT 优化和部署TensorFlow模型

DLI 精选课程 | 用TensorRT 优化和部署TensorFlow模型

英伟达NVIDIA中国

6+阅读 · 2019年3月8日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

NVIDIA教你用TensorRT加速深度学习推理计算 | 量子位线下沙龙笔记

NVIDIA教你用TensorRT加速深度学习推理计算 | 量子位线下沙龙笔记

量子位

8+阅读 · 2019年1月12日

斯坦福2018秋季课程大放送！深入浅出带你玩转机器学习加速（附超全资料+PPT）

斯坦福2018秋季课程大放送！深入浅出带你玩转机器学习加速（附超全资料+PPT）

新智元

6+阅读 · 2018年7月17日

报名 | GPU编程入门课程：使用CUDA C/C++进行并行计算加速

报名 | GPU编程入门课程：使用CUDA C/C++进行并行计算加速

PaperWeekly

10+阅读 · 2018年6月4日

机器学习必知的15大框架

机器学习必知的15大框架

云栖社区

16+阅读 · 2017年12月10日

谷歌发布TensorFlowLite，用半监督跨平台快速训练ML模型！

谷歌发布TensorFlowLite，用半监督跨平台快速训练ML模型！

全球人工智能

5+阅读 · 2017年11月15日

论文导读 | OpenCL版Caffe：高速跨平台机器学习框架

论文导读 | OpenCL版Caffe：高速跨平台机器学习框架

AI前线

4+阅读 · 2017年10月15日

相关论文

Object detection on aerial imagery using CenterNet

Object detection on aerial imagery using CenterNet

Arxiv

6+阅读 · 2019年8月22日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Training Generative Adversarial Networks Via Turing Test

Training Generative Adversarial Networks Via Turing Test

Arxiv

3+阅读 · 2018年10月25日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Arxiv

5+阅读 · 2018年7月19日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

A Framework for Evaluating 6-DOF Object Trackers

Arxiv

6+阅读 · 2018年3月28日

Fictitious GAN: Training GANs with Historical Models

Arxiv

4+阅读 · 2018年3月23日

Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation

Arxiv

3+阅读 · 2018年3月20日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

微信扫码咨询专知VIP会员